Front-End ASICs for 3-D Ultrasound: From Beamforming to Digitization

Dissertation

for the purpose of obtaining the degree of doctor
at Delft University of Technology
by the authority of the Rector Magnificus, prof.dr.ir. T.H.J.J. van der Hagen,
Chair of the Board for Doctorates

to be defended publicly on
Tuesday 3 April 2018 at 12:30 o’clock

by

Chao CHEN

Master of Science in Electrical Engineering
Delft University of Technology, The Netherlands
born in Longhai, Fujian Province, P.R. China
This dissertation has been approved by the promotors.

Composition of the doctoral committee:
Rector Magnificus, chairperson
Prof. dr. ir. N. de Jong Delft University of Technology, promotor
Dr. ir. M.A.P. Pertijs Delft University of Technology, promotor

Independent members:
Prof. dr. ir. R. Dekker Delft University of Technology
Prof. dr. R. Puers Katholieke Universiteit Leuven, Belgium
Dr.ir. P.J.A. Harpe Eindhoven University of Technology
Prof. dr. S. Cochran University of Glasgow, United Kingdom
Dr. ir. Z. Yu Institut für Mikroelektronik Stuttgart, Germany
Prof. dr. K.A.A. Makinwa Delft University of Technology, reserve member

This thesis work is supported by the Dutch Technology Foundation (STW), which is part of the Netherlands Organization for Scientific Research (NWO), and which is partly funded by the Dutch Ministry of Economic Affairs, in the framework of the program MICA: “Miniature ultrasound probe for real-time three-dimensional imaging and monitoring of Cardiac interventions”.

Printed by Ridderprint BV | www.ridderprint.nl.
ISBN: 978-94-6299-940-4

Copyright © 2018 by Chao CHEN

All rights reserved. No part of this publication may be reproduced or distributed in any form or by any other means, or stored in a database or retrieval system, without the prior written permission of the author.
致我爱的爸爸妈妈和晓靓

To my beloved parents and Xiaoliang
# Table of Contents

## Introduction

1.1 Background and Motivations ........................................ 1
1.2 Basic Principles ............................................... 4
1.3 Challenges .................................................. 7
1.4 Context of the Research .................................. 10
1.5 Thesis Organization ........................................ 12
References ............................................. 13

## Low-noise Amplifiers for Ultrasound

2.1 Architecture Choices ........................................ 17
2.2 A Compact, Low-power LNA for Piezoelectric Transducers ... 23
  2.2.1 Introduction ............................................. 23
  2.2.2 LNA Architecture ...................................... 24
  2.2.3 Circuit Implementation .................................. 26
  2.2.4 Experimental Results .................................... 29
  2.2.5 Conclusions ............................................. 33
2.3 A Single-Cable LNA Readout IC for PVDF Transducer .......... 33
  2.3.1 Introduction ............................................. 33
  2.3.2 Prototype Assembly .................................... 35
  2.3.3 Readout IC Design ...................................... 36
  2.3.4 Experimental Results .................................... 38
  2.3.5 Conclusions ............................................. 41
References ............................................. 42

## PZT Matrix with Integrated Receive ASIC

3.1 Introduction .................................................. 45
3.2 Methods ..................................................... 48
  3.2.1 Transducer Matrix on CMOS ............................... 48
  3.2.2 Micro-beamforming ..................................... 51
3.3 Implementation of the Receive ASIC .............................. 51
  3.3.1 Front-end Amplifiers .................................... 51
  3.3.2 Micro-beamformer ...................................... 54
  3.3.3 Auxiliary Circuits ...................................... 56
<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>5.3.1</td>
<td>AFE</td>
<td>109</td>
</tr>
<tr>
<td>5.3.2</td>
<td>Charge-reference Generation</td>
<td>111</td>
</tr>
<tr>
<td>5.3.3</td>
<td>SAR Logic</td>
<td>115</td>
</tr>
<tr>
<td>5.3.4</td>
<td>Dynamic Comparator</td>
<td>117</td>
</tr>
<tr>
<td>5.3.5</td>
<td>CDR and FIFO</td>
<td>118</td>
</tr>
<tr>
<td>5.3.6</td>
<td>DLL</td>
<td>120</td>
</tr>
<tr>
<td>5.4</td>
<td>EXPERIMENTAL RESULTS</td>
<td>120</td>
</tr>
<tr>
<td>5.4.1</td>
<td>Electrical Measurements</td>
<td>121</td>
</tr>
<tr>
<td>5.4.2</td>
<td>Acoustic Measurements</td>
<td>125</td>
</tr>
<tr>
<td>5.5</td>
<td>CONCLUSIONS</td>
<td>127</td>
</tr>
<tr>
<td>REFERENCES</td>
<td></td>
<td>128</td>
</tr>
<tr>
<td>CONCLUSIONS</td>
<td></td>
<td>133</td>
</tr>
<tr>
<td>6.1</td>
<td>MAIN CONTRIBUTIONS</td>
<td>133</td>
</tr>
<tr>
<td>6.2</td>
<td>MAIN FINDINGS</td>
<td>134</td>
</tr>
<tr>
<td>6.3</td>
<td>FUTURE WORK</td>
<td>136</td>
</tr>
<tr>
<td>REFERENCES</td>
<td></td>
<td>138</td>
</tr>
<tr>
<td>SUMMARY</td>
<td></td>
<td>141</td>
</tr>
<tr>
<td>SAMENVATTING</td>
<td></td>
<td>145</td>
</tr>
<tr>
<td>LIST OF ABBREVIATIONS</td>
<td></td>
<td>151</td>
</tr>
<tr>
<td>LIST OF PUBLICATIONS</td>
<td></td>
<td>155</td>
</tr>
<tr>
<td>ACKNOWLEDGEMENTS</td>
<td></td>
<td>159</td>
</tr>
<tr>
<td>ABOUT THE AUTHOR</td>
<td></td>
<td>167</td>
</tr>
</tbody>
</table>
CHAPTER 1

INTRODUCTION

1.1 Background and Motivations

The use of sound as a diagnostic tool for cardiology can be traced back to the age of ancient Greece. Medical professionals learned the patients’ body conditions by listening into their chests, originally by ears and later with the help of stethoscopes. Although the word “stethoscopes” literally means “looking into the chest” [1], visualization of the human heart with the aid of sound only became possible after the invention of echocardiography in 1953, when Inge Edler and Carl Hertz made their first successful attempt in utilizing inaudible high-frequency sound waves, i.e. ultrasound, to create heart images [2]. After half century of evolution, echocardiography has been established as an indispensable imaging modality for cardiologists, while technology innovations, mainly driven by engineers, never stop in reforming this technique and expanding its application scope.

In the 1970s, the introduction of transesophageal echocardiography (TEE), changed the medical practitioners’ perspective from “looking into the chest” to “looking inside the chest”. In contrast to the routine echocardiography examination methodology, i.e. transthoracic echocardiography (TTE), which has poor accessibility to the authentic information of cardiac structures due to the interference of ribs and lungs, the TEE approach employs an extremely tiny (miniature) ultrasound probe that can be swallowed by the patient and passed into their esophagus (Figure 1.1). Because the heart is directly adjacent to the esophagus wall, such probes are capable of operating with higher-frequency ultrasound, thus enabling heart visualization with superior spatial resolution [3]. Similar technologies have also been applied in other medical imaging regimes, such as transrectal and transvaginal imaging.
The further miniaturization of ultrasound probes continued in expanding the vision of cardiologists. Even earlier than the invention of TEE, catheter-based ultrasound devices have been used in investigating the inner structure of the heart [4]. As shown in Figure 1.1, an ultrasound transducer is mounted at the tip of a catheter and passed into the right heart chambers to capture intracardiac images, thus enabling “looking inside the heart”. Such devices, called as **intracardiac echocardiography (ICE)** probes, have been widely used in guiding interventional cardiovascular procedures. Similarly, the interest in detecting and visualizing the vulnerable plaques within the coronary arteries, has stimulated the development of another catheter-based device family, namely **intravascular ultrasound (IVUS)** probes.

The *trans*- and *intra-* device families [5] are collectively referred to as miniature ultrasound probes in this thesis. They share the same physics as their counterparts for external uses (*e.g.* obstetrics ultrasound probes): transmitting an ultrasound wave with a desired frequency into the body, and receiving the resulting echoes, whose intensity and travel time are extracted for image reconstruction. This process is interpreted in Figure 1.2. A cross-sectional (2-D) image in the azimuth

---

1 Referred to endocavity probes only. For example, transthoracic echocardiography (TTE) and transcranial probes are not ‘miniature’.

**Figure 1.1.** Conceptual illustration of transesophageal echocardiography (TEE) and intracardiac echocardiography (ICE)
direction (x-z) can be obtained with a 1-D transducer array (Figure 1.2a). The capture of a volumetric (3-D) image, however, requires either a moving 1-D array (Figure 1.2b), or a static 2-D array (Figure 1.2c). The former approach combines multiples cross-sectional images (slices) to form a 3-D image by mechanically translating or rotating the 1-D array in the elevation direction [6]. However, the limited transducer motional speed imposes a restriction on the frame-acquisition rate (typically ~10 seconds per frame [5]), thus precluding the possibility of real-time visualization. Instead, a 2-D array, as shown in Figure 1.2c, has an extended capability of steering and/or dynamically focusing in the elevation direction. Therefore, one can use a 2-D array to perform a real-time pyramidal scanning without any mechanical translation, and produce a volumetric dataset, allowing simultaneous display of images at any desired planes. It is referred to as real-time 3-D imaging [7], or 4-D imaging [8] as the time axis is also involved.

![Figure 1.2. Illustration of different transducer arrays](image)

It is clear that real-time 3-D imaging has significant clinical value for the above-mentioned cardiac imaging applications involving miniature ultrasound probes. For example, a range of key information for accurate diagnosis of cardiac diseases and quality control of interventional surgeries, such as aortic dissections and leakage of valves, are all dynamic 3-D phenomena that are difficult to interpret from 2-D or static 3-D images. For electrophysiology and valve-replacement surgeries, a continuous real-time visual feedback for surgeons is critical to ensure

2 The 3-D image reconstruction and rendering in either software and hardware using the 3-D volume dataset still introduces some latency.
the success of the procedure, which is only possible with the involvement of real-
time 3-D TEE or ICE probes.

However, there is still a dramatic gap between the urgent clinical demand and the
current technological capabilities. To date, state-of-the-art commercial miniature
probes involving fully-sampled 2-D transducer arrays are only available for TEE,
such as Philips X7-2t [9] and GE 6VT-D [10]. A common limitation of these
probes is their large head volume (in the order of 10 cm³), which is liable to
increase the discomfort level of patients or even cause intolerance [11, 12]. In
addition, imaging systems equipped with such probes (such as Philips iE33 and
GE Vivid-E9) are only able to scan a full volume within several heartbeats, and
hence not truly real-time.

The ultimate objective of this thesis work is to explore the enabling technologies
to bridge this technology gap, with the focus on the implementation of ultrasound
receivers. Through the combination of existing in-probe ultrasound signal
processing approaches and several technical innovations ranging from new
integration methods to advanced integrated circuit techniques, a promising
solution that paves the way towards the next-generation miniature 3-D ultrasound
probes has been obtained. This has been demonstrated by the realization of
several prototypes, which will be described and discussed in the following
chapters.

1.2 Basic Principles

The main technical obstacle in the development of real-time 3-D miniature probes
has been clearly aware of from their birth [13]. Table-1.1 summarizes the typical
physical parameters of 3-D miniature probes, which clearly manifests the
assembly challenge. In conventional 2-D probes, the individual wiring to linear
or phased array elements is established by fine-gauge micro-coaxial cables with
a typical outer diameter ranging from 0.18 – 0.3 mm. As the number of transducer
elements increases with the adoption of a 2-D array, accommodating the required
number of cables within gastroscopic tubes or catheters becomes unrealistic [7].

Early efforts made in addressing this problem [14, 15, 16] focused on
undersampling the 2-D aperture with sparse arrays. Its basic idea is to select a
small fraction of elements from a 2-D array with either periodical [16] or random
[14, 15] distributions, thus reducing the required number of interconnects. However, this comes at the cost of a compromised imaging quality, as the reduced number of transmit or receive transducer elements in utility inevitably leads to a degradation of the signal-to-noise ratio as well as an elevated level of grating-lobes and side-lobes [5]. Such compromises have hampered the clinical acceptance of 3-D imaging with sparse 2-D array transducers.

A practical implementation of fully-sampled 2-D arrays in miniature probes was only enabled after the proposal of subarray receive beamforming\(^3\)[17], which allows to realize an order-of-magnitude cable-reduction with the aid of in-probe electronics. The basic principle of subarray beamforming is explained in Figure 1.3. In conventional 2-D imaging with 1-D phased-array transducers (Figure 1.3a), receive beamforming is applied in the back-end imaging system to form an acoustic receive beam focused at, or steered to, any desired point in the target imaging plane. This is achieved by applying appropriate electrical delays to the echo signals received by the individual transducer elements, and coherently add them up together to create a beam. Ideally, an $N:1$ cable reduction for an $N$-element array could be readily achieved by migrating the delay-and-sum (DAS) electronics, \textit{a.k.a.} the beamformer, into the probe. However, as the required maximum delay increases proportionally with the feature size of the aperture ($N$ for both an $N$-element 1-D array and an $N^2$-element 2-D array), implementing a single-stage beamformer for a 2-D array with 100+ or even 1000+ elements (Table-1.1) would lead to prohibitively long electrical delay lines and hence unacceptable in-probe hardware cost. The concept of subarray beamforming, as shown in Figure 1.3b, was proposed to address this problem. It splits the delay

\begin{table}
\begin{center}
\begin{tabular}{|c|c|c|c|}
\hline
Application & TEE & ICE & IVUS \\
\hline
Transducer aperture dimension (2-D) & 5-10 mm & 2-5 mm & 1-2 mm \\
\hline
Typical center frequency & 3-5 MHz & 5-10 MHz & 10-20 MHz \\
\hline
# Elements & > 1000 & 100 - 1000 & < 100 \\
\hline
Gastroscopic tube/catheter diameter & $\sim$ 5-7 mm & $\sim$ 3 mm & < 2 mm \\
\hline
Imaging depth & > 10 cm & 5-10 cm & < 5 cm \\
\hline
\end{tabular}
\end{center}
\end{table}

\(^3\) Also referred to as “micro-beamforming”, “presteering”, “subaperture processing” or “subaperture processing” by different authors.
for individual elements into two stages: a fine delay with a small step-size but a shorter length, and a coarse delay with a large step-size while covering the full delay range. As such, the required numbers of delay taps in both stages can be both minimized, which significantly simplifies the electronics design. The fine-delay stage can be implemented in the probe with a much more affordable power and area budget, and the coarse-delay stage can be realized in the back-end system that interfaces to the probe with a dramatically reduced number of signal-acquisition channels (typically an order of magnitude lower than $N^2$ [8, 17, 18]).

The technical foundation of subarray beamforming is the realization of high-density integrated electronics in proximity to the 2-D transducer. In commercial bed-side ultrasound imaging systems [9], the beamforming electronics are built using off-the-shelf multi-channel integrated chips [19] or chipsets [20]. For miniature 3-D probes, however, the extremely limited volume of the probe tip precludes such options. As an alternative, the use of custom-designed front-end application-specific integrated circuits (ASICs) allows a close integration with a
2-D ultrasound transducer as well as performance optimization of the interfacing circuits. As such, it has now become a compelling and indispensable solution for implementing in-probe electronics.

Figure 1.4 illustrates a simplified block diagram of a typical front-end ASIC based on the subarray beamforming framework. Each subarray of the ASIC consists of high-voltage transmitters, which drive the transducer to generate acoustic waves, and low-voltage receivers, which perform both signal conditioning and beamforming on the received echoes. Typically, the signal conditioning circuit incorporates a wide-range (> 40 dB) of programmable gain levels to compensate the propagation attenuation of ultrasound waves, which is crucial for enhancing the dynamic range.

Figure. 1.4 Simplified schematic of ultrasound front-end electronics based on the subarray beamforming framework

1.3 Challenges

The small form factor of miniature 3-D ultrasound devices, in both the physical size and the heat dissipation budget, defines several stringent physical boundaries for the design of front-end ASICs. The engineering and commercial success of a miniature 3-D ultrasound device depends, to a significant degree, on how well its built-in front-end ASIC adapts to these physical constraints.
The first challenge comes from the dense electrical channel interconnections between the ASIC and the 2-D array transducer. In contrast to conventional approaches based on interposer layers [21], direct transducer-on-chip integration is desired, as it not only helps in down-sizing the probe-tip, but also minimizes the parasitic capacitance added to each transducer element. This calls for an element-matched ASIC layout, with a pitch identical to that of the transducer element. Ideally, the pitch of a 2-D transducer array should not be greater than half of the ultrasound wavelength ($\lambda/2$) at the center frequency. Such requirement ensures a sufficient spatial sampling frequency to avoid elevation of grating lobes in the obtained image [5]. This leads to an element pitch of 200 $\mu$m or less for typical medical imaging applications, calling for a highly-compact circuit implementation underneath the transducer element.

Another significant concern is the electrical power consumption. As the front-end ASIC, along with the transducer, dissipates heat in the probe-tip, a tight control of its operating power consumption and the associated temperature rise is extremely crucial for avoiding tissue over-heating. The FDA regulations [22] have specified the maximum allowed surface temperature for TEE probes, which sets an upper bound on both the largest transmit power and the self-heating power associated with in-probe electronics. In general, the maximally allowed heat dissipation of a standard 2-D endoscopic probe (TEE) is estimated as 1.0 W [50], and this number does not scale with the number of transducer elements. Therefore, front-end ASICs in miniature 3-D probes have to operate with an even more strict per-element power budget owing to the increased element numbers. For instance, for a 1000-element array for 3-D TEE probes, the averaged power dissipation of circuits interfacing with each element should be limited to about 1 mW, demanding a superior power efficiency in the circuit implementation.

Subarray beamforming makes it possible to reduce the channel count by approximately an order of magnitude, while the pursuit for further channel reduction never ends. The power and area limitations have so far restricted the mainstream development of in-probe on-chip signal processing in the analog domain. While analog subarray beamforming circuits have shown their advantages in achieving better power-efficiency [17], further processing and transmission of the output signals in the analog format could be problematic.
owing to their poor immunity to circuit non-idealities and environmental interference. Such problem is exaggerated when more aggressive channel reduction is required to be accomplished within the probe. While a few efforts have been made to improve the efficiency of analog-domain multiplexing, a thorough solution to this problem is to digitize the received signals in the probe and perform the channel-sharing in the robust digital domain. However, this strategy has long been considered impractical [23], as migrating standard analog-to-digital (A/D) conversion topologies for typical sensory systems to ultrasound ASICs would lead to unacceptable power and area overhead. As such, dedicated A/D conversion solutions for ultrasound are called for, which are supposed to save power and area by embracing those unique features of ultrasound transducer and systems, e.g. merging the A/D converter with the beamformer and signal conditioning circuits, or taking advantage of the resonance nature of the transducer element.

Figure 1.5 summaries the above-mentioned challenges and their interrelations. To address these challenges, both system-level innovations and circuit-level optimizations are required in the development of ultrasound front-end integrated circuits. This is an emerging field requires a solid understanding in both ultrasound transducer physics and solid-state circuit design, which motivates this thesis work.

Figure 1.5 Main challenges in this thesis work
1.4 Context of the Research

The design and fabrication of 2-D transducer arrays for volumetric imaging started in the late 1980s [24]. A lot of pioneering work was done by Smith et al. [7, 13] at Duke University. The lack of feasible dense interconnection solutions forced scientists to adopt sparse arrays at the cost of the imaging quality, which diminished the clinical value of 2-D array transducers. In fact, it did not take long for designers to realize the necessity of implementing in-probe signal processing functions. In 1989 Larson et al. proposed a system-level methodology called distributed phasing [25], which is very similar to the concept of subarray beamforming proposed by Philips [17]. However, only with the advances of modern IC technology in the last two decades were these ideas turned into reality given the strict power and performance constraints of building 2-D arrays in miniature probes.

Prior to the introduction of 2-D arrays, custom-designed integrated chips have been developed to enable electronic steering of linear arrays in devices with extreme size constraints. Black et.al [51] reported the first in-probe ASIC chipset in 1994 for use in an intravascular microprobe. Fabricated in a 3-µm CMOS process, this 4-die chipset was designed to interface with 64 PVDF transducer elements in a linear array. The limited count of elements however, relaxed the power and area constraints and allowed the adoption of standard wire bond techniques to simplify the chip assembly.

The continuously downscaling of CMOS technology provided the possibility for more aggressive integration. Beginning from the early 2000s, a number of research groups made substantial efforts in implementing integrated circuits directly underneath the 2-D transducer array, thus addressing the dense interconnection issue. Interestingly, these progresses came along with the development of another silicon-based technology, namely capacitive micromachined ultrasound transducers (CMUTs) [26, 27]. In 2002, Noble et al. presented the first 2-D CMUT array co-integrated with analog receive amplifiers built in 0.8-µm CMOS [28]. The transmit functionality was later enabled by Daft et al. by monolithically integrating the CMUT with high-voltage switches [29]. Wygant et al. designed and demonstrated a 16×16 CMUT array that was flip-chip bonded to a 0.25-µm CMOS IC incorporating both high-voltage pulsers and
Introduction

preamplifiers [30, 31]. The focus of the research at the time was the integration approach and the performance characterization of the transducer-to-CMOS interface, while the circuit topology and functionality were both kept relatively simple.

The past decade has seen the realization of more advanced signal processing functions in ultrasound front-end ICs. Transmit beamformers were introduced by Wygant et al. to produce steered and focused ultrasound beams from a 2-D CMUT array [32]. Based on this work, Bhuyan et al. reported the successful implementation of large aperture 2-D arrays with $32 \times 32$ elements by using different transducer-on-chip assembly approaches [21, 33]. Later on, a column-row-parallel ASIC architecture was proposed and demonstrated by Chen et al. to enable flexible 3-D beam-formation [34].

Concurrent with the above advances in transducer-on-chip developments, researchers were also investigating the potential optimization of circuit implementations. Receive beamforming is a challenging ultrasound signal processing function for on-chip realization due to the necessity of maintaining the dynamic range of received information. A variety of analog circuit topologies have been proposed in recent years in pursuit of a compact and power-efficient receive beamformer, such as all-pass filters [35, 36], switched-current [37] [38] and time-interleaved switched-capacitor circuits [18, 39]. The latter approach outperforms in power-efficiency for modest delay resolutions, as the majority of power is dissipated in the digital domain. On the other hand, establishing the beamforming function completely digitally is expected to yield an even higher efficiency, as the generation and control of digital delays could be much more accurate and flexible. However, such approach requires an analog-to-digital converter (ADC) for each transducer element, resulting in significant overhead in both power and area as well as the design challenge. Chen et al. [40] attempted to address these issues by leveraging element-level $\Delta \Sigma$ modulators in nanoscale CMOS process, while the results were not yet promising. As an alternative, Um et al. [41] proposed an analog-digital-hybrid beamformer architecture to reduce the required number of ADCs, which is also compatible with the subarray beamforming framework.

Besides these system-level explorations, innovative circuit-level solutions for ultrasound IC building blocks keep emerging in recent years. These contributions
cover the design of front-end amplifiers [42] [43], time-gain compensations [44, 45, 46], high-voltage pulsers [42, 47] and ADCs [48, 49].

1.5 Thesis Organization

The organization of this thesis is arranged as follows.

In order to optimize the power efficiency of the front-end ASIC, an efficient way to readout the electrical signal produced by the ultrasound transducer is needed. This is developed in Chapter 2, where a design-oriented analysis on the optimal architecture choice of front-end amplifiers based on transducer characteristic is established. Two design cases targeting at different types of ultrasound transducers (PZT and PVDF) are reviewed and compared to evaluate the effectiveness of the proposed design methodology.

As addressed in Section 1.3, a reliable approach to realize a tight integration of the 2-D transducer array and the silicon chip is key for enabling miniature 3-D ultrasound probes. A PZT-on-CMOS integration scheme is proposed in Chapter 3 and demonstrated by a prototype assembly involving a 9 × 12 PZT matrix and an element-matched receive ASIC. This prototype served as the test vehicle not only for the integration scheme, but also for the subarray beamforming circuits integrated in the ASIC. The acoustical beamforming functionality was extensively evaluated by water-tank experiments, showing the effectiveness of this technique.

Based on this prototype, a full-blown 32 × 32 front-end ASIC with integrated transducer was developed and described in Chapter 4. This ASIC incorporates both transmit and receive capabilities, thus enabling the demonstration of 3-D imaging experiments. With improved designs of all circuit building blocks, this ASIC achieves a record-low receive power-efficiency of 0.27 mW/element. To mitigate the non-idealities of analog subarray beamformers caused by device mismatches, a mismatch-scrambling technique is proposed and demonstrated in this ASIC, which helps in enhancing the dynamic range.

Following the success of the 32 × 32 array ASIC, which operates in the analog domain, we made our attempts in pushing the technology frontier towards a digital miniature 3-D probe.
In Chapter 5, a beamforming ADC architecture for achieving feasible in-probe digitization is proposed and demonstrated with a $9 \times 24$ prototype array. High-speed datalinks are employed in combination with subarray beamforming to realize a 36-fold channel-count reduction. This prototype achieves a per-element power consumption of 0.91 mW, which is $10 \times$ lower than prior work incorporating front-end digitization functions.

Chapter 6 concludes this thesis with discussions and a summary. A vision for the future improvements in both further channel reduction and more aggressive system-on-chip integration is given in a special section.

References


Introduction


Introduction


CHAPTER 2

LOW-NOISE AMPLIFIERS FOR ULTRASOUND

2.1 Architecture Choices

In most ultrasound receiver systems, a low-noise amplifier (LNA) is the building block that directly interfaces with the transducer element (possibly through a transmit/receive switch or limiter). By linearly amplifying the echo signals at the very beginning of the receive chain, the LNA reduces the noise contribution from succeeding circuits, and is thus critical in optimizing the noise-power trade-off of the entire receiver system. Depending on the transducer characteristic and the requirement on the output signal type, such amplification can be embodied as a voltage gain [1], a current gain [2], a trans-impedance gain [3], or a trans-conductance gain [4]. Therefore, a transducer-oriented design strategy for ultrasound LNAs is called for.

Capacitive-feedback voltage amplifiers (CFA) [5] and trans-impedance amplifiers (TIA) [3, 6, 7] are most commonly-used ultrasound LNA architectures. This section aims at establishing a general comparison of their noise-efficiency, which is expected to provide a guideline in the architecture choice of ultrasound LNAs.

---

4 There are also some exceptional cases. For example, the analog loop-filter of a ΔΣ modulator can be merged with the ultrasound transducer to construct a hardware-efficient front-end interface [27]. This is however, beyond the scope of this thesis.

5 The definition of the noise-efficiency factor (NEF) for ultrasound LNAs follows K.Chen et al. [6]:

\[ \text{NEF} = V_{n,in} \cdot \sqrt{P_{LNA}} \]

where \( V_{n,in} \) is the input-referred noise spectral density averaged inside the passband and \( P_{LNA} \) is the total power consumption of the LNA.
The basic architectures of a CFA and a TIA are presented in Figure 2.1, respectively. As the basis for the following discussion, some assumptions are listed below:

- To achieve the optimal noise-power trade-off, capacitive feedback is used for both architectures as a capacitive network is noise-free. Note that a capacitive-feedback TIA introduces a 90-degree phase shift to the received echo current and slightly reduces the natural resonance frequency \[8\], which should be taken into account in system-level design.

- A buffer stage, normally a source follower, is added succeeding to the operational transconductance amplifier (OTA) in the TIA stage to enforce accurate feedback, i.e. providing a sufficiently low output impedance. The power consumption of this buffer stage, and the extra power cost for probably-required frequency compensation, should be taken into account when comparing the noise-efficiency with the CFA, in which no in-loop buffer stage is needed. For applications where the LNA is directly interfaced with the external cable, an extra out-of-loop buffer stage would be also required for the CFA to drive the cable load. However, driving a heavy load using a buffer within the feedback loop probably still requires more power than the doing the same outside the loop, because the buffer in the loop may have to produce a wider bandwidth to maintain the loop stability. In other scenarios, e.g. an ADC sampler or a micro-beamformer following the LNA, the power consumption of the buffer stages should be carefully examined.
A large feedback resistor is often needed in both architectures for DC biasing purposes, but is omitted in Figure 2.1 for simplicity.

A CFA senses the voltage from the transducer by creating a relatively high input-impedance; in contrast, a TIA senses the current by establishing a low input-impedance virtual ground. This difference makes the noise comparison of these two structures not so straightforward at the first glance. In order to make a fair comparison, we use an ideal voltage amplifier as a reference, as shown in Figure 2.2. The idea voltage amplifier model has a voltage gain of $G_V$ and an infinite input impedance $Z_{in}$. The input-referred noise of both structures as shown in Figure 2.1 can be modeled by a voltage noise source $V_n^2$.

The noise-efficiency of ultrasound LNAs is strongly related to the characteristics of the transducer. In this thesis, we use the lumped Butterworth-Van Dyke electrical model [9] shown in Figure 2.3 to mimic the impedance of the transducer, which contains a voltage source $V_{in}$, a motion-branch impedance $Z_T$ formed by a RLC resonance tank, and a parasitic capacitor $C_P$. The effectiveness of this electrical model has been successfully demonstrated for a wide range of ultrasound transducers, such as piezoelectric transducers [1], capacitive micro-

Based on Norton’s theorem, the same model can also be interpreted as a current voltage $I_{in}$ in parallel with a motion-branch impedance and a parasitic capacitor, as presented in [1].
machined ultrasound transducers (CMUT) [10] and piezoelectric micro-machined ultrasound transducers (PMUT) [4]. By substituting this model into Figure 2.1 to assist the following noise analysis, the signal attenuation due to the finite input impedance of the CFA and the TIA can be taken into account.

Next, we will calculate the equivalent voltage noise density $V_n^2$ referred to the voltage source ($V_{in}$) in the transducer model, for both CFA and TIA at the frequency of interest (within a unit bandwidth). To do so, the following assumptions are made:

- The OTA in both structures has an input-referred voltage noise density $V_{n,OTA}^2$;
- The OTA in both structures has a sufficiently open-loop large (A) to enforce an accurate feedback, and a sufficiently wide bandwidth to ensure that the examined frequency point is inside the passband;
- The input parasitic capacitance of the OTA itself is not considered in the following analysis.

**CASE-I: TIA:**

The input-referred current noise of the TIA can be calculated as:

$$i_{in}^2 = V_{n,OTA}^2 \omega^2 (C_{F2} + C_p)^2$$  \hspace{1cm} (2.1)

Taking the trans-impedance gain as the impedance of $C_{F2}$, we have the TIA output voltage noise as:

$$V_{n,TIA}^2 = i_{in}^2 \cdot \frac{1}{\omega^2 C_{F2}^2} = \left( \frac{C_{F2} + C_p}{C_{F2}} \right)^2 V_{n,OTA}^2$$  \hspace{1cm} (2.2)

Hence the equivalent input-referred voltage noise when referenced to the voltage source in the transducer model is:

$$V_{n,TIA}^2 = \frac{1}{G_{V,TIA}} \left( \frac{C_{F2} + C_p}{C_{F2}} \right)^2 V_{n,OTA}^2$$  \hspace{1cm} (2.3)

where the equivalent voltage gain of the TIA can be calculated as:

$$G_{V,TIA} = \frac{1}{\omega C_{F2} \left[ \frac{Z_s(1 + \omega C_p Z_{in,TIA})}{Z_{in,TIA}} + Z_{in,TIA} \right]}$$  \hspace{1cm} (2.4)
where \( Z_{in,\, TIA} \) is the actual input impedance of the TIA, which approximately equals:

\[
Z_{in,\, TIA} = \frac{1}{(1 + A)\omega C_{F2}} \tag{2.5}
\]

**CASE-II: CFA**

The input-referred voltage noise of the CFA can be calculated as:

\[
V_{n,in,\, CFA}^2 = \left( 1 + \frac{C_{F1}}{C_I} \right)^2 V_{n,\, OTA}^2 \tag{2.6}
\]

where \( 1 + \frac{C_{F1}}{C_I} \) is the famous “noise gain factor” of the CFA structure.

Taking the mid-band voltage gain as \( \frac{C_I}{C_{F1}} \), the CFA output voltage noise is:

\[
V_{n,\, out,\, CFA}^2 = \left( \frac{C_I}{C_{F1}} \right)^2 \left( 1 + \frac{C_{F1}}{C_I} \right)^2 V_{n,\, OTA}^2 = \left( 1 + \frac{C_I}{C_{F1}} \right)^2 V_{n,\, OTA}^2 \tag{2.7}
\]

Therefore we have the equivalent input-referred voltage noise (when referenced to the voltage source in the transduce model) of a CFA as:

\[
V_{n,\, CFA}^2 = \frac{1}{G_{V,\, CFA}^2} \left( 1 + \frac{C_I}{C_{F1}} \right)^2 V_{n,\, OTA}^2 \tag{2.8}
\]

where the equivalent voltage gain of the CFA is:

\[
G_{V,\, CFA} = \frac{1}{Z_T(\omega C_p + \omega C_I) + 1} \frac{C_I}{C_{F1}} \tag{2.9}
\]

To compare the noise efficiency, we assume that:

\[
G_{V,\, TIA} = G_{V,\, CSA} \tag{2.10}
\]

From equation (2.3) and (2.8) we have:

\[
\frac{V_{n,\, TIA}^2}{V_{n,\, CFA}^2} = \left( \frac{1 + \frac{C_p}{C_{F2}}}{1 + \frac{C_I}{C_{F1}}} \right)^2 \tag{2.11}
\]

where the parameters \( \frac{C_p}{C_{F2}} \) and \( \frac{C_I}{C_{F1}} \) are constrained by equations (2.4) (2.9) and (2.10).
Equation (2.11), along with equations (2.4) (2.9) (2.10), can be used as a simple criterion for the selection of ultrasound LNA architectures when the power consumption is purely noise-limited (e.g. in high-precision low-frequency ultrasound applications). For example, if

\[
Z_T \gg \frac{1}{(1+A)\omega C_{F_2}}, \quad C_{F_2} \gg \frac{C_p}{1+A}
\]

Equation (2.11) can be simplified as:

\[
\frac{V_{n,TIA}^2}{V_{n,CFA}^2} = \left( 1 - \frac{C_I + \frac{1}{\omega Z_T}}{C_{F_2} + C_p + C_I + \frac{1}{\omega Z_T}} \right) < 1
\]

which suggests that the noise efficiency of the TIA would be always better than that of the CFA. This result agrees with the intuition because the condition (2.12) implies a high-impedance transducer with a relatively small parasitic capacitance. On the other hand, a large transducer parasitic capacitor \(C_p\) will make the TIA structure less attractive.

In the above analysis, we assume that the power consumption of the LNA is limited by the noise requirement, rather than its bandwidth. When we take the bandwidth into account, the impact of \(C_p\) becomes even more significant. \(C_p\) introduces an input pole in the TIA, which limits the bandwidth of the TIA to:

\[
\text{BW}_{TIA} = \frac{g_m}{2\pi C_p + C_L + \frac{C_p C_L}{C_{F_2}}}
\]

while the bandwidth of the CFA is approximately:

\[
\text{BW}_{CFA} \approx \frac{g_m}{2\pi C_I C_L / C_{F_1}}
\]

where \(g_m\) is the transconductance of the OTA and \(C_L\) is the load capacitance. Equations (2.14) (2.15) emphasize that in high-frequency ultrasound transducers with large parasitic capacitance, CFA is more advantageous than TIA in terms of the power efficiency.

In the following sections, the outcome of the above analysis is applied in the LNA design for specific types of ultrasound transducers, namely piezo-electronic
transducers and PVDF transducers, which reveal different impedance characteristics. A variety of circuit design techniques are also introduced in both case studies to further improve their power and area efficiency.

2.2 A Compact, Low-power LNA for Piezoelectric Transducers

2.2.1 Introduction

The increasing clinical need for better visualization of human organs, such as the valves and chambers of the heart, calls for the development of miniature endoscopic and catheter-based ultrasound probes [3, 11, 12]. Such probes will be capable of providing the physician with valuable real-time 3-D images for diagnostic purposes and for guiding interventional procedures, while being more patient-friendly and cost-effective than alternative imaging techniques. The acquisition of 3-D images requires a 2-D transducer array that consists of thousands of elements, and an associated front-end integrated circuit that connects the transducer array with the external imaging system. The strict constraints on the size and power dissipation of the probe tip present a challenge for the design of the front-end electronics. Hence, high power-efficiency and small silicon area become the main design targets for front-end circuits in such probes.

Figure 2.4 shows the architecture of a front-end integrated circuit interfaced with a 2-D piezoelectric transducer array [11]. Compared to capacitive micro-machined ultrasonic transducers (CMUTs), bulk piezoelectric transducers, typically based on PZT, offer a higher sensitivity without the need for a high DC bias voltage, while having a narrower bandwidth [10]. To simplify the implementation of the electronics, we adopt an architecture in which the 2-D transducer is divided into a receive subarray with associated integrated receive circuitry, and a smaller transmit subarray directly wired to an imaging system [11]. The techniques presented in this section, however, are equally applicable to architectures that include local transmit circuits. The receive circuitry for each transducer element consists of a low-noise amplifier (LNA), a programmable-gain amplifier for time-gain compensation (TGC) and delay lines for local beamforming. Among these building blocks, optimizing the LNA is usually the key for minimizing power

---

7 This section is based on publication “A Compact 0.135-mW/Channel LNA Array for Piezoelectric Ultrasound Transducers,” in Proc. ESSCIRC 2015, Sept. 2015, pp. 404-407.
consumption because, in most cases, the majority of the power in the receive circuitry is consumed by the LNA to arrive at an input-referred noise level that is small compared to the transducer’s noise.

In this work, an ultra-low power LNA array targeted for piezoelectric transducers is presented. It adopts a capacitive feedback topology to provide an accurate and programmable voltage gain for the received transducer signals, while optimizing the noise-power trade-off. Moreover, a single-ended cascoded inverter with local supply regulation is employed as the OTA to further improve the power efficiency. Implemented in a 0.18 µm CMOS technology, each LNA channel occupies less than 0.01 mm$^2$ of silicon area and consumes only 0.135 mW from a 1.8 V supply. Acoustic measurements have been performed by connecting the LNA array to a PZT matrix on a silicon substrate. The acoustic results show that the design achieves a noise-efficiency-factor (NEF) of 0.22 mPa·√mW/Hz, which is 2.5 × better than the state-of-the-art [8].

### 2.2.2 LNA Architecture

The choice of the LNA architecture is dictated by the electrical impedance of the target transducer. Most reported CMUT interface circuits utilize a trans-impedance amplifier (TIA) as the LNA [3, 6, 12, 13], with an input-impedance less than the transducer’s impedance to sense the motional current. However, similarly-sized bulk PZT transducers, as applied in our work, have a much lower impedance around the center frequency [10], typically a couple of kΩs. This

![Figure 2.4. Architecture of front-end integrated circuits for a 2-D piezoelectric transducer array.](image)
makes the TIA structure less effective, since creating a sufficiently low input-impedance requires extra power spent on increasing the gain of the amplifier, rather than on reducing noise.

Instead, we propose to use a capacitive-feedback voltage amplifier, as shown in Figure 2.5. It offers a mid-band voltage gain of \( A_M = C_I / C_F \) and a bandwidth determined by the trans-conductance of the operational trans-conductance amplifier (OTA). Its input impedance is dictated by the input capacitor \( C_I \) and can be easily sized to tens of kΩs within the bandwidth of interest (1 MHz to 18 MHz), so as to sense the transducer’s voltage, rather than its current.
In order to minimize power consumption, the current-efficiency of the OTA has to be maximized, which makes the CMOS inverter an attractive candidate. A straightforward inverter-based implementation is shown in Figure 2.6 [14]. In order to bias the NMOS and PMOS transistors in their optimal region, the input voltage is AC coupled to the gates, while a DC control loop sets the output voltage to the mid-supply.

When applied to the targeted ultrasound application, the circuit of Figure 2.6 has the following limitations. The AC-coupling capacitors and the parasitic capacitor at the input gates form a capacitive divider, which attenuates the input signal and thus increases the input-referred noise of the LNA. Enlarging the AC-coupling capacitors is expensive in terms of the silicon area. Moreover, the circuit suffers from poor power-supply rejection, while in ultrasound probes, the supply lines can be noisy even in the signal band due to the co-integrated digital signal processing circuits. Finally, the feedback amplifier in the DC control loop is always connected to one of the inverter’s input nodes, where it adds noise and introduces extra parasitic capacitance.

Here, these problems are solved by applying the following techniques.
**Capacitor Splitting**

As shown in Figure 2.7a (the DC bias control loop is not shown in this figure for simplicity), the input bias for the NMOS and PMOS transistors can be separated by splitting the input and feedback capacitors into two equal pairs. As such, there is no need for adding extra AC-coupling capacitors. The mid-band gain becomes

\[
A'_M = \frac{1}{2} \left( \frac{C_{I_1}}{C_{F_1}} + \frac{C_{I_2}}{C_{F_2}} \right) = \frac{C_I}{C_F} = A_M
\]  

(2.16)

where \( C_{I_1} / C_{F_1} = C_{I_2} / C_{F_2} = C_I / C_F \).

In order to maintain the same input impedance, the input capacitors in each half-branch should also be halved; so are the feedback capacitors. As results, the overall area of capacitors can be kept the same.

**Dual-rail Local Regulation**

An effective approach to improve the power-supply-rejection-ratio (PSRR) of an inverter-based amplifier is to regulate its supply lines. In this design, we propose to locally generate two internal power rails for an array of LNAs (Figure 2.7b), so as to reject noise from both the supply and the ground.

Given the fact that the loading currents for both regulators are known and approximately constant, the implementation of the regulators can be kept simple to save both power and area. A capacitor-less low-dropout regulator (LDO) based on a super source-follower [15] is adopted as the topology for both regulators. It

![Figure 2.8. Dynamic bias control scheme.](image)
provides sufficient loop gain and thus provides a reasonable PSRR even within the frequency range of the ultrasonic signals.

**Dynamic Bias Control**

To prevent the DC control loop from adding noise during the operation of the LNA, we propose to add switches that periodically activate the loop synchronously with the transmit/receive phases of the ultrasound system, as shown in Figure 2.8. During the transmit phase, when the LNA is inactive, the feedback amplifier is connected to the inverter-based OTA to activate the bias control loop. Thus, the OTA is auto-zeroed, and the settled bias voltage is stored at the gates of the input transistors. After the transmit phase, the feedback amplifier is disconnected from the OTA and has both inputs connected to mid-supply, while the LNA starts receiving the echo signal, operating at the “memorized” bias condition stored on the parasitic capacitors of the input gates. To prevent the bias voltage from drifting, the bias control loop is periodically enabled in each transmit time slot.

The effectiveness of this technique is guaranteed by the relatively short period of transmit/receive cycles in medical ultrasound imaging, which normally ranges from 100 µs to 200 µs, depending on the imaging depth. The relatively large sizes of the input transistors, needed to reduce flicker-noise, also helps to ensure the robustness of the bias voltages.

**Complete LNA Circuitry**

Figure 2.9 shows the complete schematic of the 9-channel LNA array. By applying the approaches discussed above, a power-efficient and area-compact implementation is obtained. In each LNA channel, a unity-gain-connected inverter, implemented with long-channel transistors and consuming only 0.4 µA, is connected between the two regulated supply rails to generate a reference voltage for the bias-control loop. The feedback amplifier is realized as a simple differential pair. It is designed to be able to settle within the transmit time slot (10 µs), resulting in a current consumption less than 1 µA. The inverter-based OTA is cascoded to ensure an accurate closed-loop gain, and input transistors $M_1$ and $M_4$ are biased in the weak-inversion region to enhance their current-efficiency. The bias voltage of $M_1$ is derived from a diode-generated voltage reference $V_{refp}$ via a
high-impedance pseudo-resistor. The same voltage is applied as the input of the positive-rail regulator. In such a way, the bias current of the OTA is defined by the difference of the reference currents ($I_{p1} - I_{p2}$) and the dimension ratio of $M_1$ and $M_{p1}$. While operating, 50 µA is consumed by each OTA, and 110 µA by each of the shared regulators, leading to a total power consumption of 670 µA @ 1.8V for 9 channels, corresponding to 0.135 mW per channel.

The cascoded inverter structure has a limited output swing. To enhance the dynamic range of the LNA, a programmable gain function is implemented by including switchable input and feedback capacitors. Three gain levels (24/6/-12 dB) are offered, of which the 24-dB gain is designed to achieve the best noise performance and the highest receive sensitivity.

2.2.4 Experimental Results

The LNA array has been fabricated in a standard 0.18 µm CMOS technology. Figure 2.10a presents a micro-photograph of the prototype chip. The outputs of the LNA array are multiplexed to three buffers that drive the external cables. The
core of the chip, including 9 LNAs, 2 regulators and bias circuits, occupies an area of 450 µm × 200 µm, which is equivalent to 0.01 mm² per channel.

To facilitate the acoustic characterization of the LNA array, a 10 × 10 array of PZT transducers with a 200-µm pitch, built on top of a silicon substrate with metal interconnects (Figure 2.7b), is used to provide 9-channel inputs for the LNA array via PCB traces. The transducer provides a -6 dB bandwidth from 3.2 MHz to 4.8 MHz with a center frequency at 4 MHz.

Figure 2.11a shows the measured transfer function of the LNA at different gain settings. The measured mid-band gains at each gain setting are 22.50 dB, 4.65 dB and -13.15 dB respectively, which are approximately 1 dB lower than the designed gains due to attenuation of the output buffers. The measured –3 dB bandwidth is from 0.4 MHz to 11.6 MHz. Figure 2.11b shows the measured PSRR from 200 kHz to 20 MHz. The circuit achieves a PSRR better than -45dB at 4 MHz. Figure 2.12a shows the measured input-referred voltage noise spectrum at the highest gain setting with a comparison to pre-layout simulation results. It is obtained by measuring the LNA output voltage noise and dividing it by the transfer function shown in Figure 2.11a, which indicates an input-referred voltage noise density of 5.9 nV/√Hz at 4 MHz. The integrated input noise voltage across the LNA bandwidth is 17.7 μVrms. The maximum input voltage is measured as 545 mVpp at the 1 dB compression point with the lowest LNA gain.
setting, where a -40 dB 2\textsuperscript{nd}-harmonic distortion is guaranteed. Thus, the circuit achieves an 81 dB overall dynamic range. Moreover, the measured channel-to-channel crosstalk is below -46 dB at 4 MHz.

An acoustic measurement involving the LNA array and the PZT matrix has been performed in a water-tank. A calibrated unfocused piezoelectric transducer, placed 8.5 cm away, is used as the source. Figure 2.12b shows the observed signal from a single LNA output with a 1 Vpp 4 MHz 3-cycle burst sinusoidal wave applied to the transmitter. The incident pressure has been measured with a commercial hydrophone, giving an estimated receive sensitivity of 130 mV/kPa at the highest LNA gain setting.

A performance summary and comparison with state-of-the-art ultrasound front-end amplifiers is shown in Table-2.1, including their noise-efficiency factor (NEF) as defined in [6]. The input-referred electrical noise density is translated to acoustic pressure by the receive sensitivity of the transducer to facilitate the comparison with other work. With the lowest power consumption and a comparable input-referred noise, the proposed LNA achieves a NEF that is 2.5 \times better than the state-of-the-art.
Low-Noise Amplifiers for Ultrasound

Figure 2.12. (a) Measured and simulated input-referred voltage noise spectrum (top); (b) Received acoustic signal at the output of the LNA (bottom).

### TABLE 2.1 Performance Summary and Comparison

<table>
<thead>
<tr>
<th>Spec</th>
<th>Target Transducer</th>
<th>Element Size (µm x µm)</th>
<th>LNA Bandwidth (MHz)</th>
<th>Power Cons. (mW)</th>
<th>Receive Sensitivity (mV/kPa)</th>
<th>Dynamic Range* (dB)</th>
<th>Input-referred Noise (mPa/√Hz)</th>
<th>NEF (mPa·√mW/Hz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>[3] 2D-CMUT</td>
<td>250 x 250</td>
<td>10</td>
<td>4.0</td>
<td>~70</td>
<td>N/A</td>
<td>1.8@5MHz</td>
<td>3.6</td>
<td></td>
</tr>
<tr>
<td>[12] 2D-CMUT</td>
<td>250 x 250</td>
<td>25</td>
<td>2.4</td>
<td>414</td>
<td>N/A</td>
<td><a href="mailto:0.9@4.4MHz">0.9@4.4MHz</a></td>
<td>1.4</td>
<td></td>
</tr>
<tr>
<td>[6] 1D-CMUT</td>
<td>300 x 3000</td>
<td>5.2</td>
<td>14.3</td>
<td>162</td>
<td>60</td>
<td>0.56@3MHz</td>
<td>2.1</td>
<td></td>
</tr>
<tr>
<td>[13] 2D-CMUT</td>
<td>250 x 250</td>
<td>10.2</td>
<td>1.4</td>
<td>123</td>
<td>N/A</td>
<td>2.3@5MHz</td>
<td>2.7</td>
<td></td>
</tr>
<tr>
<td>[8] 1D-CMUT</td>
<td>N/A</td>
<td>39.5</td>
<td>1.0</td>
<td>72</td>
<td>70</td>
<td>0.55**</td>
<td>0.55</td>
<td></td>
</tr>
<tr>
<td>This work 2D-PZT</td>
<td>200 x 200</td>
<td>11.2</td>
<td>0.135</td>
<td>130</td>
<td>81</td>
<td>0.6@4MHz</td>
<td>0.22</td>
<td></td>
</tr>
</tbody>
</table>

* Defined as the ratio of the maximum input signal at the 1-dB compression point and the minimum input signal at SNR = 0 dB.

** Averaged value across the bandwidth from 4.4 MHz to 15.4 MHz.
2.2.5 Conclusions

A power-efficient and compact LNA array for piezoelectric transducers has been presented. It is implemented as a capacitive-feedback voltage amplifier that senses the voltage from the transducer. A single-ended inverter-based OTA with dual-rail regulation and dynamic bias control is employed to achieve an ultra-low power consumption and a record NEF, which has been demonstrated by both electrical and acoustic measurements.

2.3 A Single-cable LNA Readout IC for PVDF transducer

2.3.1 Introduction

Cardiovascular diseases (CVD) are the primary cause of death worldwide, leading to more than 30% of the global mortality per year [16]. Among these cardiac deaths, almost half are caused by acute coronary syndromes, which are usually associated with ruptures of vulnerable plaques and thrombosis in the coronary artery. Over the last decades, a variety of imaging modalities have been developed to accurately detect the presence of vulnerable plaques [17]. A recent addition to the range of intravascular imaging techniques, intravascular photoacoustic (IVPA) imaging has proved its capabilities in identifying and locating lipid components in the vessel wall [18], which are the major risk factor for plaque rupture [19], with reasonably large imaging depth [20] and high chemical specificity for lipid type [21]. This imaging technique creates an ultrasonic imaging of the optical absorption in atherosclerotic plaques by recording the emitted pressure wave following excitation by a short optical pulse. An established fact is that the frequency range of the photoacoustic (PA) signal is inversely proportional to the dimensions of absorbing structures [20]. Therefore, it is critical to match the transducer sensitivity to the signal frequency content for in vivo imaging, which is expected to range from 2 MHz to 15 MHz. However, conventional intravascular ultrasound (IVUS) transducers used in PA imaging systems have a receive bandwidth typically above 20 MHz [22, 23]. Thus, there is a clear need for a broadband and sensitive receiver dedicated for IVPA imaging to capture PA signals in the frequency range from 2 MHz to 15 MHz.

---

8 This section is based on publication “A Single-Cable PVDF Transducer Readout IC for Intravascular Photoacoustic Imaging,” in Proc. IEEE Ultrasonics Symp. (IUS), Oct. 2015.
In this work, we propose to use polyvinylidene fluoride (PVDF) single-element transducers for sensing PA signals. Compared to the PZT ceramics, which are normally used in IVUS imaging, PVDF transducers present broader receiving bandwidth even with small sensing area, thanks to their relatively low acoustic impedance. However, a small PVDF element exhibits a high electrical impedance (>> 1 kΩ) and a relatively low capacitance (< 5 pF/mm²), while the connecting coaxial cable usually has a low characteristic impedance (~ 50 Ω) and a much higher capacitance (~ 100 pF/m). As such, the capacitive loading effect of the connecting coaxial cable will result in significant signal attenuation, thus dramatically reducing the sensitivity of the PVDF transducer. To address this problem, a readout integrated circuit (IC) that can be closely integrated with the PVDF transducer and provide the cable-driving capability is called for.

Figure 2.13 shows a conceptual diagram of the proposed IVPA catheter core with a PVDF transducer and a readout IC integrated at the tip. The target outer diameter of the catheter is 1 mm. This constrained space needs to fit an optical fiber, a flexible drive shaft and the electrical wiring required by the readout IC. So far, most reported readout ICs implemented for intravascular ultrasound probes derive their power supply from external voltage sources [22]. Thus, additional electrical connections to the catheter tip are required to transfer supply voltages. Moreover, in order to minimize the voltage drop caused by the DC resistance of cables, the
core diameter of the supply cable may not be too small (normally > 0.3 mm). Such limitations further reduce the flexibility of the catheter and increase the difficulty of the catheter assembly.

In this section, we present a readout IC that is directly integrated with a single-element PVDF transducer to capture the PA signals. A current-mode powering scheme is applied to eliminate the need for extra supply cables, which helps in reducing the rigidity of the catheter. A capacitive-feedback transimpedance amplifier is adopted as the front-end amplifier to improve the signal-to-noise ratio of received signals. A prototype IC has been designed and fabricated to prove the concept. Its electrical performance has been evaluated. In addition, the acoustic characteristic of the co-integrated readout IC and PVDF transducer have been measured to further demonstrate the effectiveness of the proposed techniques.

### 2.3.2 Prototype Assembly

Figure 2.14 illustrates the architecture of the proposed IVPA catheter with the readout IC integrated with a single-element PVDF transducer. A direct interconnection scheme was applied to connect the PVDF element and the readout IC. In our prototype, a layer of electrically-conductive glue was first applied on top of the transducer bondpads of the readout IC, which are designed for electrical contact with the PVDF element. A 52-μm-thick PVDF film with electrodes on both sides was cut in a square shape (0.6 mm × 0.6 mm) using a laser
micromachining workstation. This PVDF element was then directly mounted on top of the readout IC via the conductive glue layer. Thus, the conductive glue layer creates the electrical connection between the readout IC and the PVDF element. The readout IC was then wire-bonded to a flex-circuit, which provides the connection between the IC and the solder-pads for a 1 m single micro-coaxial cable.

2.3.3 Readout IC Design

System Overview

Figure 2.15 depicts the schematic diagram of the proposed single-cable PVDF transducer readout IC. Similar with conventional implementations [24], the proposed readout IC consists of a trans-impedance amplifier (TIA) and a source follower. The difference is, however, that both circuit blocks are powered by an external current source ($I_{\text{bias}}$), instead of a voltage source. As such, the micro-coaxial cable carries both the bias current and the signal current drawn by the source follower. The ground reference of the IC is provided by the shield of the micro-coaxial cable, which is also connected to the top electrode of the PVDF element. Such a current-mode powering scheme makes it possible to connect the catheter tip and the external imaging system with a single coaxial-cable.

At the system side, the output signal can be distinguished as a voltage $V_{\text{OUT}}$ with a high-pass filter ($R_{\text{load}}$ and $C_{\text{ac}}$). In our prototype, both the bias current source and the high-pass filter have been implemented on an adaptor PCB, which transfers
the output voltage to and derives the power supply from the external imaging system.

**TIA**

The TIA converts the current generated by the PVDF transducer into a voltage. The transimpedance gain is determined by the parallel impedances of the feedback components, i.e. $C_f$ and $R_f$. When referred to the input of the TIA, the noise contributed by succeeding circuits and the cable can be reduced by increasing the transimpedance gain. As a result, the TIA also performs as a low-noise amplifier.

According to [25], the input-referred current noise of the TIA is:

$$i_{n, in}^2 = v_{n, amp}^2 \left( \omega^2 (C_f + C_{p1} + C_{p2})^2 + \frac{1}{R_f^2} \right) + \frac{4kT}{R_f} \quad (2.17)$$

where $\omega$ is the frequency, $k$ is the Boltzmann constant, $T$ is the absolute temperature, $C_{p1}$ is the parasitic capacitance of the transducer element, $C_{p2}$ is the input capacitance of the TIA, and $v_{n, amp}^2$ is the input-referred noise of the common-source amplifier ($M_1$ and $M_2$). Thus, a large feedback resistance is desirable for optimizing the noise performance. When $\omega R_f C_f \gg 1$, the feedback capacitance dominates the transimpedance, and the input-referred noise becomes:

$$i_{n, in}^2 = v_{n, amp}^2 \omega^2 (C_f + C_{p1} + C_{p2})^2 \quad (2.18)$$

In this case, only the common-source amplifier contributes noise, leading to a better noise-power trade-off. For this reason, a feedback capacitance $C_f$ of 100 fF and a large feedback resistance $R_f$ of 6 MΩ are used, giving a transimpedance gain of about 100 dBΩ at 2 MHz. Note that the capacitive transimpedance will introduce a slight frequency-dependence in the overall system gain [7].

**Source Follower**

The output voltage of the TIA is buffered by the source follower formed by transistor $M_3$, which provides a low output impedance to match with the 50 Ω characteristic impedance of the micro-coaxial cable.

As mentioned in Section III-A, the external bias current is shared by the source follower and the TIA. A current-mirror formed by $M_2$ and $M_4$ is implemented so as to divide this bias current between both circuit blocks. The choice of the
current-mirror ratio is dictated by different current requirements of both circuit blocks so as to achieve the desired noise level and bandwidth.

Furthermore, in order to prevent the output signal current from feeding back into the TIA, a low-pass filter \((R_p, C_p)\) is included between the gates of \(M_2\) and \(M_4\). As long as \(\omega L R_p C_p \gg 1\), where \(\omega_L\) is the lower bound of the signal bandwidth, the signal voltage injected to the gate of \(M_2\) is negligible, and the bias current for the TIA can be considered as constant.

Consequently, the in-band output impedance of the readout IC is only determined by the source follower and the diode-connected transistor \((M_4)\). It can be approximated as:

\[
Z_0 \approx \frac{1}{g_{m3}} + \frac{1}{g_{m4}} \tag{2.19}
\]

where \(g_{m3}\) and \(g_{m4}\) are the transconductances of \(M_3\) and \(M_4\), respectively. Since \(g_{m3}\) and \(g_{m4}\) are both functions of the bias current \([26]\), the output impedance of the readout IC can be fine-tuned by adjusting \(I_{bias}\) to match with the characteristic impedance of the coaxial cable.

### 2.3.4 Experimental Results

The proposed readout IC has been fabricated in a standard 0.18 µm CMOS technology. Figure 2.16 presents a micro-photograph of the prototype readout IC that has been wire-bonded to a PCB. The dimensions of the die are designed as 1 mm × 0.6 mm to fit the size of the target transducer, while only a small portion of the area (~ 10%) is occupied by the circuits. A PVDF transducer has been mounted on top of the IC with the approach described in Section 2.3.2.

While operating, the IC consumes 6 mA from an external current source, which is powered by a 3.3 V supply. The DC voltage measured at the output node of the IC is around 2.2 V, leading to an on-chip power consumption of 13.2 mW.

Figure 2.17a shows the measured frequency response of the readout IC, which has been normalized. The measurement was performed by feeding a sinusoidal voltage source to the input of the readout IC via a 1 pF capacitor, which has an impedance comparable to that of the target PVDF transducer. The measured -3 dB bandwidth is beyond the frequency range from 1 MHz to 20 MHz. Figure 2.17b
shows the measured output voltage noise spectrum of the readout IC. The integrated output rms noise voltage across the bandwidth from 1 MHz to 20 MHz is 250 µV. A summary of the electrical performance of the readout IC is given in Table-2.2.

Acoustical measurements involving the readout IC and the integrated PVDF transducer have been performed in an oil-tank. To characterize the dynamic range of the readout IC and the transducer, an unfocused single-element transducer with a center frequency of 2.25 MHz was used as the acoustical signal source. A 5-cycle burst sinusoidal wave with varying peak-to-peak amplitudes was applied to the transmitter, generating acoustic pressures ranging from 0.4 Pa to 40 kPa at the surface of the prototype transducer. The signal received by the readout IC was recorded by an oscilloscope and filtered in MATLAB with a bandpass filter.
between 1.75 MHz to 2.75 MHz. Results obtained from this measurement are presented in Figure 2.18, where the peak-to-peak voltage received by the readout IC is plotted against the peak-to-peak acoustic pressure incident on the surface of the transducer. The minimum detectable (peak-to-peak) pressure is approximately 30 Pa, while the maximum pressure is about 30 kPa. The dynamic range thus found is around 60 dB.

The receiving transfer function of the readout IC and the PVDF transducer has also been measured by using a broadband focused transducer as the source transmitter. The frequency response of the proposed readout IC with the PVDF transducer is flat in the frequency range from 2 MHz to 15 MHz. At 2.25 MHz, the measured sensitivity of the prototype is 3.8 µV/Pa.
2.3.5 Conclusions

A readout IC integrated with a broadband PVDF transducer has been developed and demonstrated as the receiver for PA signals in an IVPA catheter with a diameter of 1 mm. By employing a unique current-mode powering scheme, the readout IC is capable of operating with a single micro-coaxial cable, which optimizes the flexibility of the catheter. Such a small detector can be easily accommodated within a disposable catheter.

The performance of the proposed readout IC together with the co-integrated PVDF transducer has been evaluated by both electrical and acoustical measurements. The results show that proposed readout IC has a flat frequency response from 1 MHz to 20 MHz, while the integrated output rms noise voltage in the same band is only 250 µV. When integrated with the PVDF transducer, it presents a broad receiving bandwidth from 2 MHz to 15 MHz, a 60 dB dynamic range and a sensitivity of about 3.8 µV/Pa at 2.25 MHz. The minimum detectable pressure is measured as 30 Pa. These characteristics demonstrate the effectiveness
of the proposed readout IC architecture. Such a sensitive PVDF receiver will significantly improve the sensitivity of IVPA imaging of intra-plaque lipids in humans and can dramatically decrease the required energy of the laser pulse.

References


CHAPTER 3

PZT MATRIX WITH INTEGRATED RECEIVE ASIC


3.1 Introduction

Echocardiography is a popular cardiac imaging tool used for accurate diagnosis of cardiovascular diseases and for guiding interventional procedures. Its advantages over alternative imaging techniques include its low-cost, its non-invasive character and its capability of producing real-time images. Transesophageal echocardiography (TEE) has been developed to complement the more standard transthoracic echocardiography (TTE). As its name suggests, TEE uses the esophagus as the imaging window to the heart and thus eliminates the reflections from the lungs and ribs, which limit the image quality in TTE [1]. Conventional TEE probes employ a one-dimensional (1-D) ultrasound array to obtain fan-shaped two-dimensional (2-D) cross-sectional images of the heart. The array can be mechanically rotated around the image axis to obtain images within a conical volume. However, 2-D imaging falls short in the case of complicated interventions, since cardiac morphology, leakage of valves and function of the outflow tracts are all 3-D phenomena that are difficult to interpret from 2-D images in a non-standard 3-D anatomy. Therefore, there is a clear clinical need for TEE probes that are capable of providing real-time 3-D images [2].
Commercially-available 3-D TEE probes (X7-2t, Philips Ultrasound, Bothell, WA and Vivid E9 BT12, General Electric Healthcare, Amersham, UK) have a large head volume (~ 10 cm$^3$) and are intended for procedures in adults only. Unfortunately, a TEE procedure involving such a large probe is associated with intolerance in about 1% of cases and high levels of patient discomfort in around 35% of cases [3, 4]. Procedures in non-anesthetized patients are generally limited to less than about 20 minutes [3] to minimize this. Consequently, a 3-D TEE probe of significantly smaller size that is suitable for sustained monitoring and for children becomes compelling. The target of our work presented in this chapter is to demonstrate and assess technologies that enable the implementation of a miniature 3-D TEE probe with a head volume < 1 cm$^3$. Such a small probe will enable the use of 3-D TEE for monitoring of cardiac procedures in adults and make real-time 3-D TEE imaging feasible in newborns and children.

Several challenges exist in the design of a miniature 3-D TEE probe. In conventional 2-D TEE probes, the 1-D phased array is mounted at the tip of a specialized gastroscopic tube, through which all piezo-electric transducer elements are individually connected to a bed-side ultrasound system via micro-coaxial cables with a typical outer diameter of 300 µm. Typically, for an imaging depth of 6 cm to 12 cm, an array consisting of 32 to 128 elements, operating between 5 MHz and 10 MHz, is used to obtain a field of view of ±45˚ to ±60˚ in azimuthal direction. In 3-D probes, however, the field of view extends in both azimuthal and elevation directions, calling for a 2-D matrix transducer with several thousands of independent elements [2]. For such a large array, it is not feasible for the gastroscopic tube to accommodate the corresponding number of coaxial cables while remaining flexible [5]. Moreover, the density of the elements makes their individual wiring very intricate. Moving signal processing modules into the probe is an efficient way to reduce the number of fan-out cables. However, the limited volume of the probe tip precludes the possibility to accommodate commercial ultrasound front-end integrated circuits (e.g. AD9271, Analog Devices, Norwood, MA) or multi-chip systems [5, 6] within the probe. Therefore, an application specific integrated circuit (ASIC) that could be closely integrated with the 2-D transducer matrix is desired for a 3-D TEE probe [2]. This integrated approach also provides an effective way to make connections with all individual elements.

An additional challenge is that the in-probe power dissipation of a 3-D TEE
probe, either caused by the transducers or electronics, should comply with FDA regulations [7] to prevent excessive tissue temperature rise. This requirement sets an upper bound to both the largest transmit power and the self-heating power associated with the in-probe electronics. For comparison, the heat dissipation of pulse transmission in a standard 2-D TEE probe without electronics was estimated at 1.0 W. 3-D TEE probes have to operate within a comparable overall power budget. To ensure that the in-probe integrated circuit does not contribute a significant increase to self-heating, its total power consumption should be limited to about 0.5 W, which is equivalent to 0.5 mW/channel in a 1000-element array. This is beyond the state-of-the-art of front-end ultrasound ASICs reported in open literature, which consume several milliwatts per channel [8, 9]. Within this strict power budget, a dedicated circuit design that fully optimizes the power efficiency of each ASIC building block is certainly called for.

Moreover, from an acoustic point of view, a particular challenge is to avoid ringing and crosstalk of the piezo-electric elements. In conventional 1-D arrays, the elements are backed by a relatively thick slab of damping material, to avoid reflections from the backside of the piezo elements and to minimize lateral waves that may couple into neighboring elements. In the integrated design, the opposite would be achieved when the 2-D piezo array is in direct mechanical contact with the silicon of the ASIC, which is a thin slab of material with negligible damping. To minimize ringing and crosstalk in the integrated design, a specially designed interconnect layer should be applied in between the piezo array and the ASIC.

To demonstrate and assess the technologies that will be used in a full-blown miniature 3-D TEE probe, we present a prototype with a reduced number of elements in this chapter. The prototype consists of a $9 \times 12$ element PZT matrix that is integrated with a receive ASIC. This ASIC is capable of providing the required order-of-magnitude cable reduction by applying subarray signal processing, while being sufficiently compact and power-efficient for in-probe integration. The prototype has been experimentally evaluated, both electrically
and acoustically.

The proposed concepts can be later applied to realize a full >1000-element miniature 3-D TEE probe. Figure 3.1 illustrates the physical architecture of the envisioned full probe. A lead zirconium titanate (PZT) transducer matrix with 1024 elements (32 × 32) is directly mounted on top of the ASIC. As proposed in [5], a split-array design is adopted, in which a small directly-wired subarray is used to transmit a wide beam, while the major part of the array is dedicated to receiving the resulting echoes through the ASIC. The transmit subarray will be optimized to enable parallel beamforming in reception and to guarantee a sufficiently high frame rate. A specialized flex-circuit acts as the bonding substrate of the ASIC, and provides the electrical connection between the ASIC and solder pads for micro-coaxial cables. These cables are accommodated in a flexible gastroscope tube with a length of 1 m ~ 2 m and a diameter of about 5 mm and connected to the probe handle, which connects via a thicker cable to the external imaging system. Thanks to the channel reduction performed by the ASIC, the number of cables required by the proposed 3-D TEE probe is comparable to what is currently used in miniature 2-D TEE probes.

In this chapter, the design of the prototype PZT matrix and the integrated receive ASIC are presented. Section 3.2 describes the design of the PZT matrix and the signal processing method utilized in the ASIC, as well as the PZT-on-CMOS integration approach. The circuit implementation details of the ASIC are described in Section 3.3. Section 3.4 presents the experimental setup and both electrical and acoustic measurement results. Conclusions are given in Section 3.5.

3.2 Methods

3.2.1 Transducer Matrix on CMOS

A schematic diagram of the prototype is depicted in Figure 3.2. It is constructed by mounting a 9 × 12 PZT matrix on top of an ASIC. Commercially-available CTS 3203 HD is used as the piezo-ceramic and the acoustical stack is optimized using PZFlex (Weidlinger Associates Inc., Mountain View, CA) for a centre receive frequency of 5 MHz and a 50% bandwidth, which is adequate for the fundamental imaging typically used in TEE applications. The entire array measures 1.8 mm × 2.4 mm with an element pitch of 200 µm and a dicing kerf of 30 µm. An array consisting of 9 × 9 elements is directly interfaced to the pre-
amplifiers in the ASIC. The remaining $9 \times 3$ elements are wired out through metal interconnections in the ASIC and are used for the characterization of single elements without electronics.

Given the limited space in a miniature TEE probe, the PZT matrix must be directly stacked on top of the ASIC, requiring a high-density interconnect scheme to provide the large number of electrical connections between the transducer elements and the ASIC. Several design constraints apply here. First of all, the assembly process for PZT transducers requires a working temperature well below the Curie point of the piezo-material to prevent de-polarization. In our design, the Curie temperature of the chosen material is 225°C. To keep a proper margin, the integration process temperature should be controlled below 110°C. Another challenge comes from the construction of the PZT matrix, which is cut from a PZT slab after it has been mounted on the ASIC. When dicing the piezoelectric ceramic, the cut must be deep enough to guarantee mechanical and electrical separation of the individual transducer elements, without damaging the ASIC. Therefore, there is a need for an isolation layer that is sufficiently thick to act as a dicing buffer between the PZT matrix and the ASIC.

PZT-on-CMOS integration solutions that have been reported so far all rely on intermediate connectors. In [10], a custom-designed interconnection block based
PZT Matrix with Integrated Receive ASIC

on anisotropic elastomers is utilized as an intermediate connector between multiple ASICs and the PZT matrix. The volume of such interconnection block, however, exceeds the space available in a miniature TEE probe. The flip-chip bonding technology, which is widely used in the semiconductor industry, has been applied to the direct integration of CMUT transducers and the associated front-end ICs [9, 11]. However, direct flip-chip bonding of a PZT matrix to an IC is generally not applicable, because of both the lack of a dicing buffer layer and the minimum temperature requirement of the flip-chip bonding process [12].

In this work, we apply a direct PZT-on-CMOS integration scheme as illustrated in Figure 3.3. A metallic interconnection layer is applied on top of the transducer bond-pads of a CMOS ASIC, which are arranged in a matrix pattern with the same pitch as the transducer array. A non-conductive epoxy layer is then deposited, filling the gaps between the metal. The epoxy is then grinded down to expose the metal and form electrical contacts, allowing the epoxy to act as an electrical isolation layer as well as a mechanical dicing buffer. On top of the grinded epoxy layer the PZT matrix is constructed. The electrical connection between the contacts and the electrode on the back-side of the piezoelectric ceramic is created by a layer of electrically conductive glue. This layer also has the acoustical function of minimizing ringing and crosstalk of the elements. For the purpose of acoustic matching, a conductive matching layer is applied on top of the piezoelectric layer. After that, the stack is diced to create the 2-D array. The dicing kerfs extend into the epoxy layer for approximately 10 µm, which guarantees the electrical separation of the transducer elements. The dicing kerfs are air-filled to minimize the crosstalk between elements. A ground foil is then glued on top of the matching layer to create a common counter electrode for all elements.

This integration scheme is particularly suited for prototyping and small-volume
production. Alternatively, similar direct integration of PZT on CMOS can be achieved using cleanroom post-processing to form a dicing buffer layer on top of a CMOS chip [13].

3.2.2 Micro-beamforming

To reduce the number of output signal channels, the subarray beamforming scheme [10], also referred to as “micro-beamforming” or “pre-steering” [14], is adopted in this work. The operation principle of this approach is depicted in Figure 3.4. The transducer matrix is divided into subarrays of $3 \times 3$ elements, the receive signals of which are combined by a local micro-beamformer circuit in the ASIC to reduce the number of channels by a factor of 9. The delay applied to the received signals is divided into a coarse delay, which differs from subarrays but is common for all elements within one subarray, and a fine delay for each individual element. The fine delays are applied locally in the micro-beamformer, while the coarse delays are applied in the imaging system that further processes the output signals of the micro-beamformers.

To simplify the control of the beamforming electronics, all $3 \times 3$ subarrays are given the same fine delay pattern. As such, the fine delays tilt all beam axes of the subarrays to a certain angle, thus effectively changing the beam direction of the entire array [14]. Compared to the situation in which all elements have the ideal time delay, this pre-steering approach comes at the cost of a slightly degraded beam profile [5]. A delay depth of 280 ns, corresponding to 7 delay steps with a step size of 40 ns, is required for each element to ensure that the each $3 \times 3$ subarray can cover a pre-steering range of $-37^\circ$ to $37^\circ$.

3.3 Implementation of the Receive ASIC

This section describes the implementation details of the receive ASIC. The ASIC
processes the echo signals received by the transducer elements in two steps: front-end signal conditioning, including pre-amplification and time-gain compensation (TGC), and micro-beamforming. Figure 3.5 shows a block diagram of the ASIC. It contains 81 input channels, each being interfaced to one transducer element in the $9 \times 9$ array. The channels are divided into 9 subarrays; signals from the 9 elements of each subarray are first amplified, then aligned in time using sample-and-hold (S/H) delay lines and summed up by charge redistribution among the outputs of 9 delay lines to implement the micro-beamforming. The output signal of each subarray is transmitted to the external imaging system via a 1 m ~ 2 m micro-coaxial cable; an on-chip cable driver provides the required output drive capability.

In the following, the design considerations for the front-end amplifiers, the micro-beamformers and auxiliary circuits will be discussed. A brief overview of the silicon realization will also be given.

### 3.3.1 Front-end Amplifiers

The front-end signal conditioning in each channel is performed by a low-noise amplifier (LNA) and a TGC amplifier, as shown in Figure 3.5. The LNA provides a fixed gain of 20 dB to increase the signal level, thus reducing the impact of the electrical noise generated by subsequent stages. In order to achieve an input-referred noise level that is small compared to the transducer’s noise, the majority of the power in the receive circuitry is consumed by the LNA. To meet the strict power constraint, we use a single-ended common-source amplifier with a resistive load, as illustrated in Figure 3.6. The gate of the input transistor ($M_1$) is
directly connected to the PZT element. As such, the LNA has a high input impedance and senses the voltage across the transducer. The input bias voltage of the LNA is defined by a diode-connected transistor ($M_{B1}$) and a current source ($I_B$) via a large resistor ($R_B$). Two electrostatic discharge (ESD) protection diodes are added to the input node to prevent the circuits from being damaged by any electrostatic discharge when integrating the ASIC with the transducer matrix.

A PZT transducer element is essentially a bi-electrode device; one electrode is connected to the matching layer and the ground foil shared by other elements in the array, while the signal generated on the other electrode is fed to the input of the LNA. However, a conventional common-source amplifier is inherently single-ended and only senses the voltage difference between the input node and the supply rail. As a result, any interference that appears on the ground foil or the supply rail will also be amplified. To address this problem, an NMOS transistor $M_2$ is added in series with $M_1$, and correspondingly $M_{B2}$ is added to the bias network. Both transistors are built in deep N-wells such that their substrate can be connected to the ground foil, while their gates are connected to the ESD supply (VESD), which is AC coupled to the ground foil with an external capacitor. In this way, any voltage fluctuation on the ground foil will also appear at the source of the PMOS input transistor $M_1$, so that the gate-source voltage of $M_1$ tracks the differential voltage across the transducer element and thus the effect of the ground foil interference is (partially) compensated for. By including $M_2$ and $M_{B2}$, the power-supply rejection ratio (PSRR) of the LNA is also improved.

When compared with its closed-loop counterparts [15, 16], the usage of the

![Figure 3.6. Circuit diagram of the front-end low-noise amplifier.](image)
simple open-loop topology comes at the cost of large gain errors and poor linearity [17]. As a design choice, we allocate the majority of the gain-error budget (+/- 2dB) to the LNA. The gain accuracy of the subsequent circuits, mainly the TGC amplifiers, can be well controlled by using closed-loop topologies; they are much less power hungry than the LNA due to the relaxed noise requirements. Moreover, the linearity requirement is relieved by including a bypass path in parallel with the LNA, as shown in Figure 3.5. At the high input-signal levels associated with near-field echoes, the LNA is bypassed to prevent saturation or distortion at the output of the LNA, hence enhancing the dynamic range of the received signal. Since the signal amplitudes in the near field are much higher than the noise level of the subsequent circuitry, the absence of the LNA in this situation does not affect the detection limit of the ASIC.

A TGC amplifier with four gain steps (0 dB, 12 dB, 26 dB and 40dB) follows the LNA to further enhance the dynamic range by compensating the propagation attenuation of echo signals. In addition, it helps to relax the linearity requirements of the subsequent circuits, especially the micro-beamformer. While operating, the gain of the TGC amplifier is increased with time linearly in decibels. A differential cascoded-flipped-voltage-follower with switchable degeneration-resistor network, as proposed in [5] and [18], is utilized to perform the time-gain compensation. Discrete gain levels are chosen to simplify the implementation of the amplifier [5].

The LNA and the TGC amplifier are AC coupled by a RC network to reject offset and low-frequency noise and to independently set the input DC bias condition of the TGC amplifier. The corner frequency of this RC network is 2.4 MHz, well below the lower cut-off frequency of the PZT transducer, i.e. 4 MHz.

3.3.2 Micro-beamformer

As discussed in Section 3.2, the operation principle of micro-beamforming is to apply relative delays to the echo signals received by different transducer elements in such a way that for a specific incident angle they are properly aligned in time and add up coherently. When implemented in circuits, the relative delays can be realized in the analog or the digital domain. While a digital delay line is able to offer a higher delay resolution and accuracy, the power required for the digitization of the output signal of each element is far beyond the sub-mW power budget per channel [19]. Therefore, in this design, a programmable analog delay
A line with pipeline-operated sample/hold (S/H) stages [5, 20] is used to construct the micro-beamforming cell. It consists of 8 memory cells operating in a time-interleaved fashion at a sampling rate of 25 MHz, which leads to a tunable delay range from 40 ns to 320 ns with a delay resolution of 40 ns.

The circuit implementation and timing diagram of the proposed micro-beamformer are depicted in Figure 3.7. The micro-beamformer consists of 9 delay lines. In each delay line, the output voltage of the corresponding TGC amplifier is cyclically sampled and held on capacitors $C_1$ to $C_8$ under the control of non-overlapping sampling clocks $S_1$ to $S_8$, while readout clocks $R_1$ to $R_8$ sequentially release the voltages stored on the capacitors to the output node. Thus, the signal delay can be determined by the time shift between the falling edges of the acquisition and the readout clocks. In [5], we presented a similar analog beamformer circuit that employs current-domain signal summation. It requires extra building blocks to perform the voltage-to-current conversion, which degrades the power efficiency. In this work, instead, the choice is made to sum up the signals in the charge domain [20]. As illustrated in Figure 3.7, the outputs of all 9 delay lines are joined together, causing the sampled charge to be averaged.

**Figure 3.7.** Circuit diagram of the charge-domain micro-beamformer.
among the capacitors that are connected to the output node. This effectively adds the delayed signals with low circuit complexity and high power efficiency.

In order to eliminate errors due to residual charge stored on the parasitic capacitance associated with the switches and the interconnect, the charge summing node is periodically reset by a switch before each summation cycle, leading to a return-to-zero (RZ) waveform at the output of the micro-beamformer. To recover this signal, a ping-pong S/H circuit is implemented following the micro-beamformer (Figure 3.5), which re-samples the signal at its non-zero phase at a rate of 25 MHz.

### 3.3.3 Auxiliary Circuits

Custom-designed digital circuits are included to generate the delay-control clock pattern for the delay lines. There are two 8-bit shift registers in every receive channel, which are used for the generation of acquisition and readout clocks. During the configuration phase of the ASIC, each shift register can be connected in series with its counterparts in adjacent channels of the same 3 × 3 subarray, forming a daisy-chain that allows the ASIC to be programmed via a serial interface. Thus, the delay patterns can be loaded into the shift registers. Upon the start of the beamforming phase, each 8-bit shift register is re-configured as a circular shift register by feeding back its output to its input. As such, the delay patterns loaded during the configuration phase will re-circulate as long as clocks

<table>
<thead>
<tr>
<th>Circuit block</th>
<th>Power consumption (µW/channel)</th>
<th>Proportion (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>LNA</td>
<td>155.7</td>
<td>35.5</td>
</tr>
<tr>
<td>TGC</td>
<td>51.5</td>
<td>11.6</td>
</tr>
<tr>
<td>Micro-beamformer</td>
<td>9.9*</td>
<td>2.2</td>
</tr>
<tr>
<td>Cable driver</td>
<td>91.8**</td>
<td>20.8</td>
</tr>
<tr>
<td>Biasing</td>
<td>13</td>
<td>2.9</td>
</tr>
<tr>
<td>Digital control</td>
<td>120.6</td>
<td>27.3</td>
</tr>
<tr>
<td>Total</td>
<td>442.5</td>
<td>100</td>
</tr>
</tbody>
</table>

* The power consumption of a single delay line; ** 1/9 of the power consumption of a single cable driver.
are applied. Before being connected to the S/H switches in the delay line, the parallel outputs of the shift registers are further processed by logic gates to ensure that the switches are driven by non-overlapping signals.

As mentioned earlier, the ASIC is directly loaded by the micro-coaxial cables. For the given cable length (1 m ~ 2 m) and working frequency (around 5 MHz), the combination of a cable and a characteristic load can be modelled as a lumped RC network with a resistance of 50 Ω ~100 Ω and a parallel capacitance of 100 pF ~ 200 pF. To drive such a heavy load, a cable driver with high power efficiency is required. In this design, we choose a class-AB super source follower [21] as the output voltage buffer, which is capable of providing both a high slew-rate and low stand-by power consumption.

### 3.3.4 Silicon Realization

The ASIC has been fabricated in a standard 0.18 μm CMOS process with a supply voltage of 1.8 V for both analog and digital circuits, except for the ESD protection and bias circuits of the LNAs, which use a 2.8 V supply (from which very little current is drawn). Figure 3.8 illustrates the layout and the microphotograph of the ASIC. The dimensions of the ASIC are 3.2 mm × 3.8 mm. It consists of the 9 ×
9 receive channels, each of which consists of an LNA, a TGC amplifier, an AC coupler, an 8-memory-cell delay line and the associated digital control circuits. All circuits in an individual channel are laid out within a square area of 200 µm × 200 µm, as illustrated in Figure 3.8c. A bond-pad with a size of 60 µm × 60 µm is implemented in the top metal layer on top of the LNA for creating the connection to the corresponding transducer element. The cable drivers are laid out beside the centre array and close to the I/O bondpads.

Table-3.1 lists a power consumption breakdown of all circuits blocks. While operating, the receive ASIC consumes only 35.5 mW, corresponding to 0.44 mW for each channel. In a full 1000-element miniature TEE probe, this would translate to about 440 mW, which is within the power budget for TEE probes.

### 3.4 Experimental Results

#### 3.4.1 Fabricated Prototype

Figure 3.9a shows a photograph of the fabricated PZT matrix that is mounted on a receive ASIC using the integration solution described in Section 3.2. To facilitate water-tank measurements, the prototype device has been bonded on a PCB, as shown in Figure 3.9b, and has subsequently been encapsulated in non-conductive silicone rubber.

#### 3.4.2 Acoustic Measurement Setup

To evaluate the acoustical performance of the prototype, the encapsulated PCB was immersed in a water tank in front of a 5-MHz, 0.5-inch single-element transmit transducer (V309, Olympus Scientific Solutions, Waltham, MA) that

![Figure 3.9](image-url)
was used to generate acoustic test signals, as shown in Figure 3.10. The prototype was placed at the focus of the transmit transducer, which is 10 cm away from the transducer’s surface. In order to emulate echoes arriving at different angles, the prototype was mounted on a rotational stage capable of sweeping its angle relative to the axis of the transmit transducer between $-90^\circ$ to $+90^\circ$ in steps of 0.1° (see the inset of Figure 3.14). Periodic test pulses with varying frequency, duration and amplitude were generated using an arbitrary waveform generator (AWG) (33250A, Keysight Technologies, Santa Clara, CA) and delivered to the transmit transducer via a benchtop attenuator (50BR-036, JFW Industries, Indianapolis, IN) with tunable attenuation from 0 dB to 110 dB, and a RF power amplifier (403LA, Electronics & Innovation, Rochester, NY) with a fixed gain of 37 dB. The output signals of the ASIC were captured using a digital oscilloscope (DSOX4054A, Keysight Technologies, Santa Clara, CA). The ASIC was programmed using a FPGA on a custom-designed mother PCB.

To calibrate the pressure generated by the transmit transducer, a 0.2 mm needle hydrophone with integrated pre-amplifier (Precision Acoustics, Dorchester, UK) was placed at the position of the prototype prior to the experiments. Using the AWG, the transmit transducer was driven with a 20-cycle sine wave, the frequency of which was swept from 2 MHz to 8 MHz in steps of 50 kHz. The resulting acoustic pressure was recorded using the hydrophone to derive the transfer function of the transmit transducer, which was found to have a bandwidth of 80%, a centre frequency of around 5.8 MHz, and an efficiency of 0.56 kPa/V at focus for 5 MHz.

### 3.4.3 Single-element Characterization

In order to determine the transfer function of the individual elements of the prototype, the transmit transducer was driven in the same way as during the hydrophone calibration, while the voltage signals induced across the individual elements of the prototype array, oriented at 0° relative to the transmit transducer,
were recorded successively using the oscilloscope. After correcting for the transfer function of the transmit transducer, these measurements were used to determine the frequency response of the individual elements. However, it should be noted that the capacitive loading of the measurement cable and the measurement equipment have not been compensated for, which leads to a large attenuation of the measured signal owing to the impedance mismatch between the single element and measurement equipment. Given that the attenuation increases with frequency, this influences the measured transfer function.

In Figure 3.11, the measured frequency response of 9 individual piezo-electric elements is plotted as a function of frequency. The relative differences between the individual transfer functions is less than 3 dB within the frequency range from 3.5 MHz to 6.7 MHz, showing a good reproducibility of the manufacturing process. The elements have a centre frequency of 5.5 MHz, a -3-dB bandwidth of about 40%, and a sensitivity of 0.2 μV/Pa. The bandwidth is quite limited but adequate for TEE imaging, where harmonic imaging is less important since there is no clutter from the skin, ribs, fat layers, etc. The measured element sensitivity appears very low due to the attenuation of the signal by the capacitive loading of
the coaxial cables connected to the elements. The capacitance of a single piezoelectric element is approximately 1 pF, while the length of the coaxial cables used in the experiment is 1 m, corresponding to a capacitive load of about 100 pF. Thus, the measured single-element sensitivity is about 40 dB lower than the unloaded sensitivity.

3.4.4 Micro-beamforming: Frequency Response

The transfer function of the signal-conditioning and beamforming circuits on the ASIC was first measured electrically by applying a sinusoidal test voltage to an ASIC without transducer matrix, of which all transducer bond-pads of a $3 \times 3$ subarray have been wired-bonded for electrical testing. The frequency of the test signal was swept from 0.38 MHz to 10 MHz while the output voltage of the corresponding subarray circuit of the ASIC was captured using the oscilloscope. No beam steering was applied to the micro-beamformer. The resulting frequency response is shown in Figure 3.12, where Gain 1-7 represent various combinations of gain settings of the ASIC’s signal-conditioning circuits (LNAs and TGCs). The measured gain at 5 MHz varies from -11 dB to 34 dB, corresponding to a 45
PZT Matrix with Integrated Receive ASIC

The measured gain steps are in good agreement with the design values. The absolute gain levels are approximately 10 dB lower than what would be expected based on the combined gain of the LNA and the TGC. This difference can be attributed to the small attenuation associated with several source followers in the ASIC that act as signal buffers and with the parasitic capacitance at the charge summing node of the micro-beamformer. It has no significant impact on the functionality of the ASIC.

To determine the transfer function from an acoustical input signal to a micro-beamformed electrical output, a similar approach was used as for the single-element characterization. The LNA and TGC were configured to the lowest gain setting (Gain 1 in Figure 3.12), and the micro-beamformer was configured to pre-steer at 0°. The results are shown in Figure 3.13. All the 9 micro-beamformed subarrays in the prototype have a similar response, with a variation of less than 3 dB among the micro-beamformers. Since the bandwidth of the ASIC exceeds that of the transducer elements, the centre frequency of 5.5 MHz and a -3 dB bandwidth of 40% are virtually identical to those found for the individual elements.

However, compared with the frequency response of individual elements as given
in Figure 3.11, a notable difference is found in the spectrum around 7 MHz, where a second resonance peak is present in the frequency response of the micro-beamformed outputs of the subarrays. A similar resonance peak is found using PZFlex simulations of an individual transducer element, but cannot be observed in the measured frequency response of the individual elements. Simulations using a KLM model [22] of a transducer element show that the absence of this peak can be qualitatively explained by the fact that the single elements in subsection 3.3.3 are loaded by a coaxial cable, which causes an attenuation that increases with frequency and thus masks the resonance peak. The elements connected to the circuitry of the ASIC, in contrast, are only loaded by the relatively small input capacitance of the LNA. Their output signal is therefore much less attenuated and reveals the second resonance peak.

### 3.4.5 Micro-beamforming: Steering Response

To characterise the ASIC’s micro-beamforming function, three different delay patterns were programmed: with equal delays, 40 ns delay steps, and 80 ns delay steps between adjacent elements, corresponding to pre-steering angles of 0°, 17° and 37°, respectively. In this way, the possible beams of the 3 × 3 subarrays will have a sufficient overlap to cover the whole region of interest. The effect of
micro-beamforming is illustrated in Figure 3.14 by recording the micro-beamformer output for two different delay patterns when the prototype is positioned at an angle of 37° to the transmitter. The micro-beamformed output corresponding to a delay equivalent to 37° steering is significantly higher than the output corresponding to 0°, demonstrating that the micro-beamformer works well.

In order to obtain the beam profile, the transmit transducer was driven using a 20-cycle sinusoidal pulse of 5 MHz, while the angle of the prototype relative to the transmit transducer was swept from −90° to +90° using the rotational stage. The micro-beamformer output voltage of the central subarray was recorded using the oscilloscope. The results are shown in Figure 3.15, along with the beam profile

**Figure 3.15.** Measured and simulated beam profile of a 3 × 3 micro-beamformed subarray excited by a 20-cycle sinusoidal pulse at 5 MHz, for steering angles of 0°, 17° and 37°.

**Figure 3.16.** Beam profiles, simulated with Field II, for a 32 × 32 array excited by a 3-cycle sinusoidal pulse at 5 MHz, for steering angles of 0°, 17° and 37°.
of a 3 × 3 array simulated with Field II [23, 24] for the same delay pattern, assuming a piston-like transducer for a continuous-wave excitation. Given that the 20 cycle excitation in our measurement is almost a CW excitation, we observe that the theoretical values agree quite well with the measurement results, especially with the position of the grating lobes and side lobes. There is also reasonable agreement in the level of the grating lobes between the theory and the measurement. In addition, the measured beam-width of a 3 × 3 subarray is almost identical to the expected beam-width of an element with the same size, which indicates that the crosstalk between elements of neighbouring subarrays is very small.

While Figure 3.15 demonstrates the proper operation of the micro-beamformer, the results for the 3 × 3 element subarray are not easily extendable to a 32 × 32 element array of a full TEE probe. In general, for a large array that is operated in pulsed mode, the radiation pattern will be concentrated more around the main beam and the grating lobes will be much lower than in the foregoing results. The beam profile of such a 32 × 32 array operating in a pulsed mode (for a 3-cycle sinusoidal at 5 MHz) for the three steering angles was simulated using Field II, results of which are shown in Figure 3.16. These concur with the expectation for a large array, and the grating lobes for the extreme steering angle i.e. 37°, is about 20 dB lower than the grating lobes from the 3 × 3 array simulations.

3.4.6 Dynamic Range

The dynamic range is the ratio between the maximum and the minimum pressure that can be detected with the prototype. In order to determine the overall dynamic range, an acoustic measurement was performed in which the voltage output of the AWG was swept from 0.01 V to 10 V, and, moreover, different attenuations were applied using the benchtop attenuator. Unlike in previous experiments, a 3-cycle 5 MHz sinusoidal pulse was used, to closely resemble the signals used in imaging systems. Thus, acoustic pressures ranging from 0.8 Pa to 180 kPa were generated at the surface of the prototype’s central subarray, while the corresponding micro-beamformer output was recorded for the various gain settings of the signal-conditioning circuits.

The results from this experiment are shown in Figure 3.17, where the peak to peak voltage received by the central subarray is plotted against the peak to peak acoustic pressure incident on the surface of the subarray. Keeping in mind the bandwidth of the transmitted pulse (4 MHz to 6 MHz for a 3-cycle sinusoidal pulse), all signals recorded during the course of this experiment were filtered
using a bandpass filter between 4 MHz and 6 MHz in MATLAB to filter any out-of-band noise. In order to obtain a flat pass band as well as a steep roll off beyond the cut-off frequencies, a 15th order Butterworth filter was chosen.

From these measurements, the sensitivity of a single transducer element is found to be about 9 µV/Pa. This is much higher than the sensitivity of 0.2 µV/Pa reported in Section 3.4.3, because the elements are not loaded by a cable, but only by the LNA. The measured sensitivity is in reasonable agreement with the (unloaded) sensitivity of 15 µV/Pa simulated using PZFlex.

![Figure 3.17](image)  
**Figure 3.17.** Measured output voltage (peak-to-peak) as a function of the pressure at the surface of the transducer matrix (peak-to-peak), for the different TGC gain settings, demonstrating the dynamic range of the ASIC; results obtained using a 3-cycle sinusoidal pulse at 5 MHz.

The minimum detectable pressure is limited by noise. As shown in Figure 3.17, its peak-to-peak value is approximately 7 Pa. A more accurate value is obtained by determining the rms output noise level at the highest gain setting, which, when pressure noise referred to the input of a single element using the measured sensitivity, corresponds to approximately 2.4 Pa. This is in reasonable agreement with the value expected based on the measured input-referred voltage noise of the ASIC and the Johnson-Nyquist noise associated with the resistive part of the transducer’s impedance, which together correspond to 2.1 Pa.

The maximum pressure that can be detected is about 50 kPa and is limited by the
saturation of the signal in the ASIC, which occurs approximately when the peak-to-peak output voltage of the micro-beamformer reaches 200 mV. The dynamic range thus found is around 77 dB.

### 3.5 Conclusions

A prototype PZT matrix with an integrated receive ASIC that is targeted for real-time 3-D TEE imaging has been presented in this chapter. The $9 \times 12$ matrix transducer employs the subarray beamforming technique to significantly reduce the number of external signal channels. Through the experimental evaluation of this prototype, we have delivered a proof of the effectiveness of the proposed techniques, including the array fabrication, the signal processing method, the integration approach and the circuit implementation. The proposed architecture may be scaled up to realize a full TEE transducer capable of making real-time 3-D images.

The receive ASIC, which integrates both front-end signal conditioning circuits and micro-beamformers, has been fabricated in a 0.18 µm CMOS process. The signal conditioning circuit for each transducer element includes a single-ended LNA, which can be bypassed, and a quad-level TGC amplifier, the combination of which provides a programmable gain range of 45 dB. A micro-beamformer circuit utilizing pipeline operated sample/hold delay lines and charge-domain summation is implemented in each subarray. It is capable of providing a delay resolution of 40 ns and a delay depth of 280 ns to steer the beam in azimuthal and elevational directions over $\pm 37^\circ$. The power-efficiency of all circuit blocks have been carefully optimized, leading to an average power consumption of 0.44 mW/channel.

The acoustic characteristics of a single transducer element, which has the silicon-based ASIC and the interconnection materials as the backing layer, has been evaluated with both simulations and experimental measurements. The measured response of the single element deviates quite significantly from the simulation, especially its sensitivity, due to the capacitive loading of the single element by the measurement equipment. This effect is not observed in case of the micro-beamformed subarrays, which indicates the positive effect of PZT-on-CMOS integration on the system performance.

A water-tank measurement has been performed to verify the functionality of the prototype and evaluate its performance. Measurement results confirm the effectiveness of the micro-beamforming and the PZT-on-CMOS integration.
PZT Matrix with Integrated Receive ASIC

approach. The results also show more than an order of magnitude increase in sensitivity between the single element and the subgroup, indicating that the ASIC prevents the signal attenuation that would occur if the transducer elements were directly loaded by the micro-coaxial cables. This allows the prototype to achieve a measured dynamic range of 77 dB.

The promising measurement results obtained from the evaluation of the prototype PZT matrix encourage the further up-scaling of both the transducer matrix and the ASIC. Based on these results, a fully-populated matrix involving more than 1000 elements and an improved co-integrated ASIC will be implemented in our future work [25].

References


CHAPTER 4

FRONT-END ASIC WITH ANALOG SUBARRAY RX BEAMFORMING


4.1 Introduction

Volumetric visualization of the human heart is essential for the accurate diagnosis of cardiovascular diseases and the guidance of interventional cardiac procedures. Echocardiography, which images the heart using ultrasound, has become an indispensable modality in cardiology because it is safe, relatively inexpensive and capable of providing real-time images. Transesophageal echocardiography (TEE), as its name indicates, generates ultrasonic images from the esophagus, by utilizing an ultrasound transducer array mounted at the tip of a gastroscopic tube (Figure 4.1). Conventionally, the elements of the transducer array are connected using micro-coaxial cables to an external imaging system, where properly-timed high-voltage pulses are generated to transmit an acoustic pulse, and the resulting echoes are recorded and processed to form an image.

2-D TEE probes are widely used in clinical practice. They employ a 1-D phased-array transducer to obtain cross-sectional images of the heart. However, such 2-D images often fall short in providing comprehensive visual information for
complex cardiac interventions, such as minimally-invasive valve replacements and septal-defect closures. Appropriate real-time 3-D imaging would be very beneficial for improving the success rate of such procedures [1].

The relatively large probe heads (typically > 10 cm$^3$) of current 3-D TEE probes cannot be tolerated by the patient during longer procedures (unless general anesthesia is applied) and are too large for pediatric use. For longer-term monitoring and pediatric use, the volume of the probe tip should be constrained to an upper limit of 1 cm$^3$, and the tube diameter to 5 ~ 7 mm [2]. To enable real-time 3-D imaging, a 2-D phase array is required. For an array of aperture size $D \times D$, the achievable signal-to-noise ratio (SNR) and the lateral resolution both improve linearly with $D$. Therefore, it is desirable to make full use of the available array aperture within the probe tip (5 × 5 mm$^2$). In addition, the pitch of the transducer elements should not exceed half of the acoustic wavelength ($\lambda$) to minimize grating lobes and to ensure proper spatial imaging resolution [3]. For a 2-D array with a center frequency of 5 MHz, this corresponds to a pitch of 150 µm, leading to at least 32 × 32 elements. Accommodating the corresponding
number of micro-coaxial cables within the narrow gastroscopic tube is difficult or even impossible. Decreasing the aperture size to reduce the number of channels will lead to a significant deterioration in both the SNR and the lateral resolution. As a result, channel reduction should be performed locally to reduce the number of cables with the aid of miniaturized in-probe electronics [4].

A variety of approaches have been proposed to reduce the cable count in endoscopic and catheter-based ultrasound systems. Part of the beamforming function, which is conventionally performed in the external imaging system to achieve spatial directivity and enhance the signal-to-noise ratio, can be moved into the probe [5, 6]. Time-division multiplexing approaches have been applied in [7, 8] to allow multiple elements to share a single cable. Solutions based on element-switching schemes [9, 10] have also been reported. All these approaches rely on the realization of a front-end ASIC that is closely integrated with the transducer array.

Design of such front-end ASICs is challenging in several aspects. First, the power consumption of the ASIC, which contributes to the overall self-heating of the probe, should be kept below an estimated 0.5 W [11], to avoid excessive tissue temperature rise [12]. This translates to 0.5 mW/element for a 1000-element array and is beyond the state-of-the-art of front-end ultrasound ASICs, which consume at least 1.4 mW/element [10, 13, 14]. Another challenge comes from the dense interconnection between the ASIC and the transducer array. Direct transducer-on-chip integration is desired, as it not only helps to get a small form factor, but also reduces the parasitic interconnect capacitance added to each transducer element. This calls for an element-matched ASIC layout, with a pitch identical to that of transducer elements. As a result, a highly compact circuit implementation for the ASIC is called for. Prior works [13, 15] compromised somewhat on the imaging quality by opting for a slightly larger pitch. Indirect transducer-to-chip integration via interposer PCBs [6, 10] allows the use of a different pitch for the transducer array and the ASIC. However, the limited space within the TEE probe tip precludes this option.

In this chapter, we present a front-end ASIC that is optimized in both system architecture and circuit-level implementation to meet the stringent requirements of 3-D TEE probes [16]. It is directly integrated with an array of $32 \times 32$ piezoelectric transducer elements, which are split into a transmit and a receive array to facilitate the power and area optimization of the ASIC [17]. The receive
elements are further divided into 96 subarrays, each with a switched-capacitor-based beamformer, to realize a 9-fold cable reduction. Besides, an ultra-low-power LNA architecture [18], which incorporates an inverter-based OTA with a bias scheme tailored for ultrasound imaging, is proposed to increase the power-efficiency of the receive circuitry, while keeping the area compact. In addition to that, a mismatch-scrambling technique is applied to mitigate the effects of mismatch between the beamformer stages, and thus improve the overall dynamic range of the ASIC while receiving. These circuit techniques, while designed for PZT matrix transducers, are also relevant for other types of ultrasound transducers, such as CMUTs. The functionality of the ASIC as well as the effectiveness of the proposed techniques have been successfully demonstrated by imaging experiments.

The chapter is organized as follows. Section 4.2 describes the proposed system architecture. Section 4.3 discusses the details of the circuit implementation. Experimental results are presented in Section 4.4. Conclusions are given at the end of the chapter.

4.2 System Architecture

4.2.1 Transducer Matrix Configuration

In conventional ultrasound probes, each transducer element is used both as transmitter and receiver. A high-voltage CMOS process is then needed to generate the transmit pulses of typically tens of Volts [14]. The integration density of high-voltage processes is generally lower than that of their low-voltage counterparts with the same feature size, which is disadvantageous for ASICs that directly interconnect with 2-D transducer arrays with a tiny element pitch.

In this work, we use an array of $32 \times 32$ PZT elements with separate transmit and receive elements (Figure 4.2). An $8 \times 8$ central subarray is directly wired out to transmit channels in the external imaging system using metal traces in the ASIC that run underneath 96 un-connected elements to bond-pads on the chip’s periphery. These traces are not connected to any junctions in the substrate, and can hence support high transmit voltages provided that they are sufficiently spaced to prevent dielectric breakdown and routed in the top metal layers to minimize capacitive coupling to the substrate. All other 864 elements are
connected directly to on-chip receiver circuits, whose outputs are fed to the imaging system’s receive channels.

The use of a small central transmit array helps in reducing the overall cable count as well as obtaining a large opening angle while receiving. With respect to a conventional array configuration in which each transducer element is used for both transmit and receive, our scheme trades lateral resolution for a higher frame rate. In our scanning procedure, the transmitter is used to generate only a few wide beams, illuminating an area that can accommodate a number of parallel receive beams per transmit pulse, thus yielding a high frame rate. On the other hand, it should be also ensured that the generated acoustic pressure is adequate for the target imaging depth. According to our numerical simulations in PZFlex, 64 elements should be capable to generate sufficient pressure for an imaging depth up to 10 cm. Moreover, despite the missing elements in the receiver aperture, the point-spread-function (PSF) is comparable with a fully-populated receiver, as shown by simulations in [19]. This configuration allows the use of a dense low-voltage CMOS technology, thus saving power and circuit area. Compared to [13], which uses the majority of elements to transmit and a sparse array to receive, it achieves better receiving sensitivity as well as lower side-lobes.
Moreover, it also helps to reduce the overall in-probe heat dissipation, as transmit circuits normally consume more power [10].

The transducer array was constructed by dicing a bulk piezo-electrical material (CTS 3203 HD) into a matrix. It is directly mounted on top of the front-end ASIC using the PZT-on-CMOS integration scheme described in [11]. The PZT matrix measures 4.8 mm × 4.8 mm with an element pitch of 150 µm and a dicing kerf width of 20 µm. It was designed for a center frequency of 5 MHz and a 50% bandwidth (3.75 MHz ~ 6.25 MHz).

4.2.1 Subarray Beamforming in Receive

The cable-count reduction approach that we adopted in this work is to perform partial receive beamforming in the ASIC. The basic principle of ultrasound beamforming is to apply appropriate relative delays to the received signals in such a way that ultrasound waves coming from the focal point arrive simultaneously and can be constructively combined. Full-array beamforming for 32 × 32 transducer elements is impractical for circuit implementation due to the large delay depth required for each element, which is typically a few microseconds. The subarray beamforming scheme [5], also known as “micro-beamforming” [17], mitigates this issue by dividing the beamforming task into two steps. A coarse delay that is common for all elements within one subarray is applied in the external imaging system, while only fine delays for the individual elements (less than 1 µs) is applied by subarray beamformers in the ASIC, which significantly reduces the implementation complexity of the required on-chip delay lines.

The subarray size is determined based on the following concerns. First, in order to keep the symmetry of the beamforming in lateral and elevation directions, a square subarray is desired. Besides, a larger subarray brings a more aggressive cable-count reduction, but comes at the cost of an elevated grating-lobe level and a greater maximum fine delay in the subarray beamformers. We selected a 3 × 3 configuration to achieve a reasonable acoustic imaging quality, while reducing the number of cables by a factor of 9 [20]. Accordingly, the 864 receive elements of the transducer matrix are divided into 96 subarrays and interfaced with 96 subarray receiver circuits in the ASIC.

The fine delays are programmable in steps of 30 ns up to 210 ns, allowing the
subarray’s directivity to be steered over angles of 0°, ±17°, and ±37° in both azimuthal and elevation directions [11]. All subarrays can be programmed identically, which is appropriate for far-field beamforming and requires loading of only 9 delay settings into the ASIC, which has a negligible impact on the frame rate. The ASIC is also equipped with a mode in which all subarrays can be programmed individually (i.e. 96 × 9 settings), allowing near-field focusing at the expense of a longer programming time, and hence a slightly slower frame rate.

4.3 Circuit Implementation

Figure 4.3 shows the schematic of a 3 × 3 subarray receiver. It consists of 9 LNAs, 9 buffers, 9 analog delay lines, a programmable-gain amplifier (PGA) and a cable driver. A pair of protection diodes are implemented at the input of the LNA to prevent the input from exceeding the supply voltages by more than a diode drop. The LNA output is AC-coupled to a flipped source follower buffer that drives the analog delay line. The joint output of all 9 analog delay lines is then amplified by the PGA. A cable driver buffers the output signal of the PGA to drive the micro-coaxial cable connecting to the imaging system. A local bias circuit (not shown) is implemented within each subarray.

The echo signals received by the transducer elements have a dynamic range of about 80 dB, 40 dB of which is associated with the fact that echoes from deeper tissue are attenuated more along their propagation path. The gains of the LNA and the PGA are programmable to compensate for this attenuation. The LNA is
optimized for a low noise figure (< 3 dB) and provides a voltage gain up to 24 dB, to attenuate the impact of noise of the subsequent stages at small signal levels. The gain can be reduced to -12 dB and 6 dB to avoid output saturation at high signal levels. The PGA provides an additional switchable gain with finer steps (0, 6, 12 dB) to interpolate between the gains steps of the LNA. Thus, an overall dynamic range of more than 80 dB, which is sufficient for TEE imaging, can be achieved.

As described in Section 4.1, all the above circuits, along with their biasing and digital control circuits, must be implemented within the area of a 3 × 3 subarray, i.e. 450 μm × 450 μm, while consuming less than 4.5 mW. Dedicated circuit techniques have been applied to meet these requirements, which will be discussed in this section.

4.3.1 LNA

As explained in Chapter 2, the choice of the ultrasound LNA topology is dictated by the electrical impedance of the target transducer. Trans-impedance amplifiers (TIA) are widely used in readout ICs for CMUT transducers because of their

![Figure 4.4](image-url). The measured impedance of a 150 μm × 150 μm PZT transducer element and its equivalent electrical model.
relatively high impedance [21]. However, a similarly-sized PZT transducer has a much lower impedance around the resonance frequency, typically a couple of kΩs for our transducers (Figure 4.4). In view of this, the TIA topology falls short in achieving an optimal noise/power trade-off, since creating a low enough input-impedance requires extra power spent on increasing the open-loop gain, rather than on suppressing the input-referred noise [18]. In this work, instead, we use a capacitive-feedback voltage amplifier\(^9\), shown in Figure 4.5, which offers a mid-band voltage gain of \( A_M = C_I / C_F \). Its input impedance is dictated by the input capacitor \( C_I \) and can be easily sized to tens of kΩs within the transducer bandwidth, so as to sense the transducer’s voltage rather than its current.

\(^9\) This ASIC employs the same LNA topology and circuit techniques as described in Chapter 2 (Section 2.1), while embedding it in a larger-scale array. Therefore, the description of the key circuit techniques and implementation is to some extent repeated in this subsection (4.3.1) to improve the readability of this chapter.
A current-reuse OTA based on a CMOS inverter is employed to enhance the power-efficiency of the LNA. In previous inverter-based designs [22], extra level-shifting capacitors ($C_{LS}$) are used to independently bias the NMOS and PMOS transistors, as shown in Figure 4.6a. These level-shifting capacitors and the associated parasitic capacitors at the virtual ground node form a capacitive divider, which attenuates the input signal and thus increases the input-referred noise of the LNA. Enlarging $C_{LS}$ helps in reducing this noise penalty, at the cost of increased die area. In this work, the level-shifting capacitors are eliminated by applying a split-capacitor feedback network [18, 23]. As shown in Figure 4.6b, the input bias points for the NMOS and PMOS transistors are de-coupled by splitting the input and feedback capacitors into two equal pairs, which maintains the same mid-band gain $C_I/C_F$ and the same input impedance.

To maximize the output swing, the bias voltage of the inverter-based OTA should be properly defined. This is usually achieved with the aid of a DC control loop, in which a slow auxiliary amplifier keeps the output at the desired operating point [22]. However, such a DC control loop will recover too slowly from disturbances caused by the high-voltage pulses propagating across the ASIC during the transmit phase. Therefore, instead, we dynamically activate the bias control loop in synchronization with the transmit/receive (TX/RX) cycles of the ultrasound system, as shown in Figure 4.7. During the TX phase, the input of the LNA is grounded and the inverter is essentially auto-zeroed, while the auxiliary amplifier

![Figure 4.7. Dynamic bias control scheme.](image-url)
drives the gate of the NMOS transistor so as to bias the output at mid-supply. During the RX phase, the auxiliary amplifier is disconnected, and both its inputs are shorted to the mid-supply. Meanwhile, the LNA starts receiving the echo signal by operating at the “memorized” bias points. Given that the typical TX/RX cycle in cardiac imaging is relatively short, ranging from 100 µs to 200 µs, the bias voltage hardly drifts during the RX phase. The relatively large sizes of the input transistors ($W/L_N = 75/0.2$, $W/L_P = 60/0.2$), needed for both flicker-noise reduction and current-efficiency optimization, also help to keep the bias voltages stable. The sample-and-hold operation associated with the auto-zeroing causes broadband white noise to be sampled on the gate of the NMOS transistor and held constant during the receive phase. Therefore, it appears as a small offset voltage that is superimposed on the “memorized” bias point during each transmit/receive cycle, and does not deteriorate the in-band noise performance of the LNA. Moreover, it is further filtered out by the AC-coupler following the LNA and has no impact on the bias condition of succeeding stages.

A well-known down-side of a single-ended inverter-based OTA is its poor power-supply-rejection ratio (PSRR) [24]. As the LNAs are closely integrated with high-

![Figure 4.8. Complete schematic of the LNA.](image-url)
frequency digital circuits for beamformer control, the supply line and the ground are inevitably noisy. To improve the PSRR, we generate two internal power rails within each subarray by means of two regulators ($REG_P$ and $REG_N$ in Fig.8) that are shared by the 9 LNAs of a subarray. Given the fact that the loading currents of these regulators are known and approximately constant, their implementation can be kept rather simple to save area. A capacitor-less LDO based on a super source-follower [25], capable of providing a PSRR better than 40 dB at 5 MHz, is adopted as the topology for both regulators.

Figure 4.8 shows the complete schematic of the proposed LNA. The inverter-based OTA is cascoded to ensure an accurate closed-loop gain, and input transistors $M_1$ and $M_4$ are biased in weak-inversion to optimize their current-efficiency. The bias voltage of $M_1$, $V_{refP}$, which is derived from a diode-connected PMOS transistor via a high-impedance pseudo-resistor, is shared by the input gate of the positive-rail regulator $REG_P$. Thus, the bias current of the OTA can be defined by the difference of the reference currents ($I_{p1} - I_{p2}$) and the dimension ratio of $M_1$ and $M_{p1}$. In each channel, a unity-gain-connected inverter, implemented with long-channel transistors and consuming only 0.4 µA, is connected between the two regulated power rails to generate a mid-supply reference that is approximately 900 mV. The auxiliary amplifier for DC bias control is realized as a simple differential pair. With a current consumption of less than 1 µA, it is capable to settle within the 10 µs TX phase. A switchable capacitive feedback network, involving capacitors 14C and 7C that can be switched in or out under control of digital gain-control inputs of the ASIC, is implemented to provide the mentioned 3 gain levels for dynamic range enhancement. An explicit loading capacitor (not shown in Fig. 8) is added at the output of the LNA to limit its -3 dB bandwidth below 10 MHz.

### 4.3.2 Subarray Beamformer

Figure 4.9 shows the circuit implementation and timing diagram of the subarray beamformer. It consists of 9 programmable analog delay lines, each of which is built from pipeline-operated S/H memory cells that run at a sampling rate of 33 MHz, corresponding to the target delay resolution of 30 ns. Due to the fact that the sampling rate is higher than the designed bandwidth of the LNA, the increase in the noise floor caused by aliasing is negligible.
The capacitor in each memory cell is carefully sized to ensure that the associated kT/C noise is not dominant, while meeting the area requirement. With 300 fF metal-insulator-metal (MIM) capacitors, an input-referred rms noise voltage of about 118 µV is expected for each delay line, which is smaller than the output noise of the LNA at its highest gain setting.

The outputs of all 9 delay lines are passively joint together to sum up and average the charge sampled on the capacitors that are connected to the output node [11]. Compared to voltage-mode summation [11], this scheme eliminates the need for a summing amplifier, and is thus more compact and power-efficient. However, a potential source of errors is the residual charge stored on the parasitic capacitance at the output node, which causes a fraction of the output of the previous clock cycle to be added to the output signal. This is equivalent to an undesired first-order infinite-impulse-response low-pass filter. While this filtering can be eliminated by periodically removing the charge from the output node using a reset switch [11], here we choose for the simpler solution of minimizing the parasitic capacitance at the output node. It can be shown that an acceptable signal attenuation within the bandwidth of 0-10 MHz of less than 3 dB is obtained if this

![Figure 4.9. Schematic and timing-diagram of the subarray beamformer.](image)
The control logic for programming the delay lines is also integrated within each subarray. Its core is a delay stage index rotator that determines the sequence in which the memory cells are used, as conceptually shown in Figure 4.10. The detailed circuit implementation is shown in Figure 4.11. It consists of an 8-stage shift register (D₁-D₈) in which the 4-bit binary indices of memory cells (1-8) are stored and rotated. Upon startup, register Dₙ₀ is preset to n. D₁ stores the index of the memory cell used for sampling the input signals, while D₂-D₈ store the indices of candidate memory cells for readout. A 3-bit selection code, provided by a built-in serial peripheral interface bus (SPI), decides which of these candidates is used, allowing the delay of the individual delay line to be programmed. One-hot codes derived from the selected 4-bit binary indices are re-timed by non-overlapped clocks to control the sample/readout switches in the memory cells.

As mentioned in Section 4.2, the SPI interfaces in all subarrays can be either loaded in parallel, or configured as a daisy-chain to load different delay-patterns to individual subarrays. With a 50 MHz SPI clock, only 0.54 μs is needed to program the ASIC’s delay pattern in the parallel mode, while for the daisy-chain mode it takes about 13 μs (subarrays in each quadrant of the ASIC form one daisy-chain), leading to a 9% frame rate reduction for an imaging depth of 10 cm. As such, the daisy-chain mode enables near-field focusing at the expense of a slightly slower frame rate.

4.3.3 Mismatch-scrambling

The S/H memory cells suffer from charge injection and clock feed-through errors, the mismatch of which introduces a ripple pattern with a period of 8 delay steps (240 ns) at the output of the delay lines. Such ripple pattern manifests itself as undesired in-band tones in the output spectrum of the beamformer, which limits the dynamic range of the signal chain.

To mitigate this interference, we propose a mismatch-scrambling technique by adding an extra memory cell and a redundant index register D₉, as shown in both Figure 4.10 and Figure 4.11. A pseudo-random number generator (PRNG) embedded in each subarray generates a bit sequence (PRBS) that decides whether
the index of D₈ or D₉ shifts into D₁, while the other index shifts into D₉. Thus, memory cells are randomly taken out and inserted back into the sequence. This operation randomizes the ripple pattern and converts the interfering tones into

**Figure 4.10.** Operation principle of mismatch-scrambling.

**Figure 4.11.** Circuit implementation of the delay line control logic with mismatch-scrambling.
Front-End ASIC With Analog Subarray RX Beamforming

broadband noise. The mismatch-scrambling function can be switched on/off with a control bit (MS_EN in Figure 4.11).

The PRNG in each subarray is implemented as a 12-bit Galois linear-feedback shift register (LFSR) [26]. It can be re-configured as a shift register to allow the sequential loading of its initial state, i.e. the seeds. Similar to the daisy-chain mode of the delay-pattern SPI interface, these shift registers can also be cascaded to allow different seeds to be loaded into the individual subarrays. Applying a set of randomized seeds for all subarrays is expected to further de-correlate the sequences of memory cell rotation on the scale of the full-array. As a result, the excess noise generated by the scrambling process can be suppressed when the output signals of the subarrays are combined by the beamforming operation in the imaging system, thus improving the SNR.

![Figure 4.12. Schematic of the PGA.](image-url)
4.3.4 PGA

Figure 4.12 shows the schematic of the PGA, which is implemented as a current-feedback instrumentation amplifier [17, 27] with a single-ended output. It consists of a differential pair of super source followers with a tunable source-degeneration resistor $R_S$, which performs as a linearized trans-conductor, and a current mirror with a constant load resistor $R_L$, which converts the trans-conductor’s output current to voltage. The voltage gain of the PGA is defined by the ratio of both resistors $R_L/R_S$. $R_S$ is implemented as a switchable resistor array ranges from 6 k$\Omega$ to 18 k$\Omega$, while $R_L$ is constant (24 k$\Omega$). To avoid using very large CMOS switches for getting small on-resistance, Kelvin connections are used to eliminate errors caused by the on-resistance of those switches (Figure 4.12). Compensation capacitors ($C_C$) are added to ensure the loop stability. These capacitors are switched along with the gain settings from 800 fF at the lower gain setting to 400 fF at the highest gain setting. A differential topology is applied to improve the PGA’s immunity to interference. The negative input terminal ($V_{in}$) is connected to the output of a replica delay-line buffer, whose input node is AC-coupled to ground while sharing the same DC bias voltage with the other buffers. The PGA is sitting after the subarray beamformer. Therefore, when comparing its noise contribution with preceding stages, the noise averaging effect [10] of the beamformer should be taken into account. It is designed to have an input referred noise density below 30 nV/$\sqrt{\text{Hz}}$ to prevent adding excess noise when referred to the input of the LNA.

4.3.5 Cable Driver

The cable driver is required to fan-out the output signal of each subarray across a micro-coaxial cable with capacitance up to 300 pF. To maximize its power-efficiency, a class-AB super source follower [28], as depicted in Figure 4.13, is adopted as the topology for the cable driver. Instead of using a high-impedance pseudo-resistor to form a quasi-floating gate, the gate of the PMOS transistor is only connected to the bias circuit during the TX phase, but kept floating during the RX phase, similar to the dynamic DC bias scheme used in the LNA. When referred back to the input of the signal chain, the noise contribution of the cable driver is negligible as it’s compressed by the gain of the PGA.
4.4 Experimental Results

The ASIC has been realized in a 0.18 µm low-voltage CMOS process with a total area of $6.1 \times 6.1 \text{ mm}^2$, as shown in Figure 4.14a. Figure 4.14b presents a zoom-in view of one subarray receiver that is matched to a $3 \times 3$ group of transducer elements with a pitch of 150 µm. While receiving, the ASIC consumes only 230 mW, which is less than half of the power budget for a 3-D TEE probe.

Figure 4.15a shows a fabricated prototype with an integrated $32 \times 32$ PZT matrix transducer. The assembly has been bonded to a daughter PCB to facilitate acoustic measurements (Figure 4.15b). A matching layer and ground foil are applied on top of the PZT matrix. The ground foil is directly connected to the ground potential of the ASIC via PCB traces. Bonding wires on the periphery of the ASIC are covered by a non-conductive epoxy layer for waterproof.

The ASIC’s 96-channel subarray outputs and 64-channel high-voltage transmit inputs are connected to a mother-PCB via micro-coaxial cables with a length of 1.5 m. The mother PCB is directly mounted on a programmable imaging system (Verasonics V-1 system, Verasonics Inc., Redmond, WA), which acquires the RF data from the ASIC and drives high-voltage pulses via metal traces in the ASIC to transmit elements in the transducer array. Counting in the required power-
supply and digital control lines, the total number of cables required for connecting the ASIC to the imaging system is around 190.

Using this setup, the ASIC’s electrical and acoustic performance have been characterized experimentally, the results of which are presented in this section.

### 4.4.1 Electrical Characterization

The electrical performance of the proposed LNA architecture has been fully characterized and evaluated with a separate test IC [18]. It demonstrates a 9.8 MHz bandwidth, an 81 dB dynamic range and an input-referred noise density of 5.5 nV/√Hz @ 5 MHz at its highest gain, while consuming only 0.135 mW per channel. When interfaced with an external, small PZT array that gives a receive...
sensitivity of about 10 µV/Pa, the LNA achieves a noise-efficiency factor (NEF) [29] that is $2.5 \times$ better than the prior state-of-the-art [14].

Figure 4.16 shows the measured transfer function of a $3 \times 3$ subarray receiver in the ASIC, with a uniform delay pattern applied to the subarray beamformer. Various combinations of LNA/PGA gain settings were applied to achieve a programmable mid-band gain ranging from $-24$ dB to $24$ dB with a gain step of $6$ dB. The measured absolute values of the mid-band gain levels are approximately $6$ dB lower than the theoretical values of the LNA/PGA gain combinations, which can mainly be attributed to signal attenuation in the delay line buffers and cable drivers and to the attenuation associated with the parasitic capacitance at the beamformer’s summing node. This deviation does not deteriorate the imaging quality, as long as an adequate SNR can be maintained at the subarray output by an appropriate selection of gain settings. The $-3$ dB bandwidth is about $6$ MHz, ranging from $0.3$ MHz to $6.3$ MHz. Note that the sinc-filtering effect of the sample-and-hold operation in the beamformer also
contributes to the gain roll-off at higher frequencies, which introduces 4 dB extra attenuation at 16.5 MHz (half sampling frequency).

To investigate the output noise level of the subarray receiver circuits, we use an ASIC without integrated transducer matrix, in which all bond-pads for transducer

![Figure 4.17](image)

**Figure 4.17.** Measured noise spectrum of the averaged output of 24 subarrays without (a) and with (b) mismatch-scrambling. LNA gain = 6 dB and PGA gain = 6 dB.

![Figure 4.18](image)

**Figure 4.18.** Measured rms noise voltage after post-beamforming as a function the number of subarrays. Noise is integrated over a bandwidth of 2.5 MHz - 7.5 MHz. LNA gain = 6 dB and PGA gain = 6 dB.
interconnection are electrically shorted to ground by means of wire bonding. With the highest LNA and PGA gain settings, the electrical output noise density of a $3 \times 3$ subarray is measured as 120 nV/$\sqrt{\text{Hz}}$ at 5 MHz. This is in good agreement with the simulated value of 106 nV/$\sqrt{\text{Hz}}$. With a 300 mV maximum peak-to-peak

### TABLE-4.1. ASIC Performance Summary

<table>
<thead>
<tr>
<th></th>
<th>RX</th>
<th>TX</th>
</tr>
</thead>
<tbody>
<tr>
<td>Supply voltage</td>
<td>Analog: 1.8 V</td>
<td>Digital: 1.4 V</td>
</tr>
<tr>
<td>Total power</td>
<td>Analog: 190 mW</td>
<td>Digital: 38.9 mW</td>
</tr>
<tr>
<td>-3 dB Bandwidth</td>
<td>6 MHz</td>
<td></td>
</tr>
</tbody>
</table>
| Input-referred noise density @ 5 MHz* | w/o mismatch-scrambling: 1.0 mPa/$\sqrt{\text{Hz}}$ | with mismatch-scrambling: 2.0 mPa/$\sqrt{\text{Hz}}$ (worst case**)
| RX sensitivity| $\sim 5 \mu\text{V/Pa}$ @ LNA input |
| Gain steps    | -12/-6/0/6/12/18/24/30/36 dB |
| HD2           | 43 dBc @ 300 mV peak-to-peak output, 5 MHz |
| Max. peak-to-peak TX pulse voltage | 50 V |
| TX efficiency | $\sim 6 \text{kPa/V}$ @ 5 cm |

*Calculated as subarray output noise density / measured voltage gain at 5 MHz.

**The measured input-referred noise with the mismatch-scrambling function enabled varies with different delay patterns because of a systematic mismatch in the layout of S/H delay lines, which could be optimized by a better layout.

### TABLE-4.2 System-Level Comparison with Prior Works

<table>
<thead>
<tr>
<th></th>
<th>[6]</th>
<th>[13]</th>
<th>[10]</th>
<th>[11]</th>
<th>This work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
<td>1.5 $\mu\text{m}$ HV</td>
<td>0.25 $\mu\text{m}$ HV</td>
<td>0.18 $\mu\text{m}$ HV</td>
<td>0.18 $\mu\text{m}$ LV</td>
<td>0.18 $\mu\text{m}$ LV</td>
</tr>
<tr>
<td>Transducer</td>
<td>CMUT</td>
<td>CMUT</td>
<td>CMUT</td>
<td>PZT</td>
<td>PZT</td>
</tr>
<tr>
<td>Array size</td>
<td>$16 \times 16$</td>
<td>$32 \times 32$</td>
<td>$16 \times 16$</td>
<td>$9 \times 12$</td>
<td>$32 \times 32$</td>
</tr>
<tr>
<td>Center freq.</td>
<td>5 MHz</td>
<td>5 MHz</td>
<td>5 MHz</td>
<td>5 MHz</td>
<td>5 MHz</td>
</tr>
<tr>
<td>Element Pitch</td>
<td>250 $\mu\text{m}$</td>
<td>250 $\mu\text{m}$</td>
<td>250 $\mu\text{m}$</td>
<td>200 $\mu\text{m}$</td>
<td>150 $\mu\text{m}$</td>
</tr>
<tr>
<td>Pitch $\leq \lambda/2$?</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>Beamform Function</td>
<td>TX</td>
<td>TX</td>
<td>Off-chip RX Subarray</td>
<td>RX Subarray</td>
<td></td>
</tr>
<tr>
<td># of TX el.</td>
<td>256</td>
<td>960</td>
<td>256</td>
<td>N/A</td>
<td>64</td>
</tr>
<tr>
<td># of RX el.</td>
<td>32</td>
<td>64</td>
<td>256</td>
<td>81</td>
<td>864</td>
</tr>
<tr>
<td>Integration method</td>
<td>Flip-chip bonding Via Interposer</td>
<td>Flip-chip bonding</td>
<td>Flip-chip bonding Via Interposer</td>
<td>Direct Integration</td>
<td>Direct Integration</td>
</tr>
<tr>
<td>ASIC size</td>
<td>$10 \times 6 \text{mm}^2$</td>
<td>$9.2 \times 9.2 \text{mm}^2$</td>
<td>$6 \times 5.5 \text{mm}^2$</td>
<td>$3.2 \times 3.8 \text{mm}^2$</td>
<td>$6.1 \times 6.1 \text{mm}^2$</td>
</tr>
<tr>
<td>RX power/el.</td>
<td>9 mW</td>
<td>4.5 mW</td>
<td>1.4 mW</td>
<td>0.44 mW</td>
<td>0.27 mW</td>
</tr>
</tbody>
</table>
output amplitude, the peak SNR at the highest gain setting thus found is about 51 dB.

Figure 4.17a shows the measured output noise spectrum without enabling the mismatch-scrambling function. Two interference tones appear at fractions of the sampling frequency ($f_s/8$, $f_s/4$), which dominate the noise floor and thus reduce the dynamic range. After enabling mismatch-scrambling (Figure 4.17b), these tones get eliminated from the output spectrum, at the expense of a small increase in the noise floor.

The noise power reduction associated with the system-level beamforming has been measured by combining the subarray output signals acquired using the Verasonics system. Figure 4.18 shows the measured rms noise voltage after beamforming as a function of the number of subarrays. Ideally, if the noise at the outputs of the subarrays is uncorrelated, the noise power after beamforming should decrease inversely proportionally to the number of subarrays involved. Without mismatch-scrambling, this is not the case, because the subarray outputs signals are dominated by (correlated) mismatch-related tones. With mismatch-scrambling enabled, the noise level shows the expected improvement, i.e. decreasing at a slope close to 10 dB/dec, provided that randomized seeds are delivered to the different pseudo-range number generators. With the same seed used in all subarrays, the tones disappear from the output spectrum, but the randomized mismatch signals of different subarray are still correlated and hence are not reduced by the averaging operation of the system-level beamformer.

Table-4.1 summarizes the measured electrical performance of the ASIC. A system-level comparison with reported works on ASICs for 3-D ultrasound imaging is given in Table-4.2. Our ASIC achieves both the best power-efficiency in receiving and the highest integration density.

### 4.4.2 Acoustic Experiments

The fabricated prototype shown in Figure 4.15 was immersed in a water tank (Figure 4.19) for the evaluation of its acoustic performance. To measure the transmit efficiency of the center subarray, all 64 TX elements were driven simultaneously by the Verasonics system and the pressure was measured at 5 cm using a hydrophone. With a 50 V excitation, a transmit pressure of 300 kPa was measured, leading to a transmit efficiency of about 6 kPa/V.
To characterize the receive beam-steering function of the ASIC, a single element transducer of 0.5 inch diameter and 5 MHz central frequency (Olympus) has been used as an external source, which generates a quasi-continuous plane wave at the surface of the prototype transducer. The prototype was mounted on a rotating stage and turned from -50° to +50° with a step of 2°. The delays of subarray beamformers in the ASIC were programmed successively to steer the subarrays maximum sensitivity towards 0°, 17° and 37°. The corresponding measured subarray beam-profiles, shown in Fig. 20, are in good agreement with expectations, with the peak of the beams corresponding well to the programmed steering angles.

4.4.3 Imaging results

To demonstrate the 3-D imaging capability of the prototype, a pattern of seven point scatterers (six steel balls and one needle), forming a letter “M” (Figure 4.21a), was placed at a distance of approximately 35 mm in front of the transducer array. A diverging wave was transmitted from the prototype, using a pulse of 18 V (peak-to-peak), generated by the Verasonics systems and applied to the transmit subarray through the connections on the ASIC. A 3-D volume image was re-constructed by combining the subarray output signals recorded using the Verasonics system from multiple transmit-receive events, and rendered to get a frontal view of the point scatterers (Figure 4.21b), which clearly shows the layout of the scatterers.

Currently, the 3-D image reconstruction has been done offline and 169 transmit-receive events were used to generate one volume as shown in Figure 4.21b [30].

Figure 4.19. Schematic diagram of the acoustic experiment setup. For the beam-steering measurements and the characterization of transmit pressure, scatterers were replaced by single-element transducers and a hydrophone, respectively.
In a future real-time implementation, this would correspond to a frame rate of 44.4 volumes per second for an imaging depth of 10 cm. When the daisy-chain mode for delay-pattern programming is enabled, the frame rate reduces to about 40 volumes per second. We have also noted that volumes can be reconstructed from at minimum 25 transmit-receive events, at the cost of slightly degraded image quality [30]. This results in a frame rate of 300 volumes per second in the fast imaging mode.

4.5 Conclusions

A front-end ASIC with a co-integrated 32 × 32 PZT matrix transducer has been design and implemented to enable next-generation miniature ultrasound probes for real-time 3-D transesophageal echocardiography. The transducer array is split into a transmit and a receive subarray to facilitate the power and area optimization of the ASIC. To address the critical challenge of cable-count reduction, subarray receive beamforming is realized in the ASIC with a highly-compact and power-efficient circuit-level implementation, which utilizes the mismatch-scrambling technique to optimize the dynamic range. A power- and area-efficient LNA architecture is proposed to further optimize the performance. Based on these
Front-End ASIC With Analog Subarray RX Beamforming

techniques, the ASIC demonstrates state-of-the-art power and area efficiency, and has been successfully applied in 3-D imaging experiments.

References


CHAPTER 5

FRONT-END ASIC WITH INTEGRATED SUBARRAY BEAMFORMING ADC

This chapter is based on publication “A Front-End ASIC with Integrated Subarray Beamforming ADC for 3-D Ultrasound Imaging in Miniature Probes”, submitted to IEEE Journal of Solid State Circuits.

5.1 Introduction

Data acquisition from 2-D transducer arrays has become one of the main challenges for the development of endoscopic and catheter-based 3-D ultrasound imaging devices, such as transesophageal echocardiography (TEE) [1], intracardiac echocardiography (ICE) [2] and intravascular ultrasound (IVUS) [3, 4] probes. The main obstacle lies in the mismatch between the large number of transducer elements ($10^3$ to $10^4$) and the limited cable count in these systems. Recent advances in transducer-on-CMOS integration methods [5, 6] have enabled the use of front-end ASICs performing signal conditioning and data-reduction in proximity of the transducer. The concept of subarray beamforming [7], which divides the transducer array in subarrays and combines the signals received by the elements in each subarray by a local delay-and-sum operation, is capable of reducing the channel count by an order of magnitude. This has been successfully demonstrated in several ASIC prototypes [1, 8] and made it possible to develop commercial 3-D ultrasound probes with 3000+ transducer elements [9]. However, it is still an arduous engineering problem to assemble hundreds of cables within endoscopes or catheters of $\leq 5$ mm diameter [2]. Such constraints have been
forcing system designers to trade-off the imaging quality against the physical dimension as well as the fabrication cost [2]. On the other hand, the growing clinical demand for higher spatial imaging resolution as well as broader field of view, keeps evoking the development of 2-D arrays with larger aperture size and smaller element pitch, calling for even more aggressive cable-count reduction in the probe.

A variety of efforts have been made in recent years to further reduce the cable-count by making a better use of the cable capacity. A straightforward idea is to time-multiplex the received signals from multiple channels onto a single cable in the analog domain [10]. Such approach, however, suffers from the limited bandwidth and transmission-line effects of the micro-coaxial cables, resulting in significant channel-to-channel crosstalk even with power-hungry impedance matching [11] or equalization [12]. Other analog modulation methods, such as frequency-division multiplexing (FDM), also suffer from the cable non-idealities. A common-ground of the abovementioned techniques is the need for high-resolution analog-to-digital converters (ADCs) at the system side to facilitate data demodulation. They normally run at a conversion rate far beyond the ultrasound bandwidth, thus further increasing the system complexity.

To address the data acquisition dilemma, a more radical solution is to move ADCs into the probe and perform the channel-reduction in the digital domain, where complex modulated signals can be transmitted with much better robustness against noise, interference and crosstalk. Moreover, in-probe digitization would open up the possibility to migrate more signal processing functionalities into the probe, such as post-beamforming [13] and compressive beamforming [14], which are expected to further reduce the channel count. A prerequisite to achieve this goal is an efficient way to implement a massively-parallel ADC array within the stringent power and area constraint of miniature ultrasound probes.

Based on the framework of subarray beamforming, in-probe digitization can be realized by digitizing the output of an analog subarray beamformer with per-subarray ADCs [15]. Alternatively, digital beamforming with per-element ADCs can be considered [16] [17] [18]. The latter approach requires $D^2$ ADCs with an equal number of preconditioning circuits and input buffers for a $D \times D$ subarray, plus the associated digital FIFOs for the realization of delays. This is difficult to
be directly integrated underneath a pitch-constrained 2-D transducer with an affordable power consumption. In [19], the area problem was addressed by using nanoscale CMOS technologies. Nevertheless, the reported element-level ΔΣ ADC is larger than the ideal half-wavelength pitch of 150 µm at 5 MHz, and the power dissipation is more than two order-of-magnitudes higher than its analog counterpart [1]. Moving the ADC to the output of the analog beamformer helps in reducing the area and power cost for A/D conversion. The work of [15] employs subarray beamformers based on S/H circuits followed by a stand-alone Nyquist-rate ADC. It turned out that the overall silicon area and power are both dominated by the analog beamformer, resulting in a per-channel footprint that is 12× larger than the dimension of a transducer element in a 5-MHz array and a power consumption far exceeding the heat dissipation limit for endoscope- and catheter-based probes [20].

In this work, we propose an element-matched ASIC architecture to demonstrate the feasibility of efficient in-probe digitization in miniature ultrasound probes. It provides subarray beamforming for a directly-integrated 5-MHz PZT matrix with a half-wavelength element-pitch of 150 µm. In each 3 × 3 subarray, a compact Nyquist-rate beamforming ADC is implemented following the analog front-end circuits. By merging the beamforming and digitization functions coherently in the charge domain, there is no need for intermediate ADC buffers, resulting in significant power and area reduction. The output of each beamforming ADC is serially exported to datalinks at the ASIC periphery, where a high-speed digital

Figure 5.1. System overview
Serializer is implemented to reduce the total fan-out channel-count by a factor of 4. Consequently, the ASIC achieves a 36-fold channel-count reduction, while consuming less than 1 mW/element power dissipation in the receive mode. The effectiveness of the proposed architecture has been successfully demonstrated in both electrical tests and a 3-D imaging experiment.

This chapter is organized as follows. The architecture of the proposed ASIC is discussed in Section 5.2. Section 5.3 describes the circuit implementation details of the subarray receiver and the datalink. Section 5.4 presents the experimental setup and results. A conclusion is given in Section 5.5.

5.2 System Architecture

5.2.1 Overview

Figure 5.1 shows an overview of the proposed system. It consists of a front-end ASIC, a 5-MHz 150-µm-pitch PZT matrix and the associated external electronics. The PZT matrix is constructed from a bulk piezoelectric material (CTS 3202 HD) that is 3-D stacked on the ASIC using the PZT-on-CMOS heterogeneous integration process described in [6]. A metallic interconnection layer and a
conductive glue layer create the electrical connection between the back-side electrodes of PZT elements and the bondpad array on the ASIC.

As a proof-of-concept, we use a PZT matrix with a relatively small aperture in this work, while the proposed circuit architecture and layout can be both up-scaled for a complete matrix array transducer for 3-D imaging (similar to the 32 × 32 array presented in [1]). The matrix is divided into 3 × 24 transmit (TX) elements and 6 × 24 receive (RX) elements based on the split-array concept [6]. Similar to [1], the TX elements are directly wired-out to external high voltage pulsers using metal traces in the ASIC, thus enabling the use of a low-voltage CMOS process. Nevertheless, the concepts presented in this chapter are equally applicable to an ASIC in which local TX pulsers are realized using a high-voltage CMOS technology. Every 3 TX elements in the same column are connected together in the ASIC to reduce the overall I/O count. In RX, subarray beamforming is applied on 3 × 3 elements to realize a 9-fold channel-reduction, yielding in total 16 subarrays.

Figure 5.2a shows the architecture of RX electronics in the ASIC. Each subarray receiver consists of a 9-channel analog front-end (AFE) that acquires echo signals from the transducer elements and conveys the conditioned signals to delay lines. The outputs of the nine delay lines are summed up in the analog domain and directly digitized by an ADC. The digitized data is then transferred to a datalink at the periphery of the ASIC, where the data from 4 subarrays are combined on a shared LVDS (low-voltage differential signalling) output channel, thus realizing an extra 4-fold channel-count reduction. The timing and delay control logic circuits are also implemented within each subarray. At the system side, an FPGA receives the beamformed data from the ASIC, which is then transmitted to a PC for image reconstruction.

5.2.2 AFE

The AFE in each channel consists of a low-noise amplifier (LNA) and a programmable-gain amplifier (PGA). Figure 5.2b illustrates the expected peak-to-peak voltage signal received by a transducer element as a function of time. Note that the time axis, assuming a constant speed of sound, is equivalent to the axial depth. Echoes resulting from deeper scatterers will arrive later, and will be more attenuated, leading to an overall peak-to-peak amplitude range from 30 µV to 500 mV, where the instantaneous dynamic range, i.e. the contrast at the same
imaging depth, is about 40 dB. The depth-dependent (and hence time-dependent) attenuation can be compensated by applying time-varying gains in the AFE. Such time-gain compensation (TGC) function is implemented by distributing discrete gain-steps between two stages in the AFE. Prior to the PGA, the LNA pre-conditions both small echo signals from deep scatterers, where the acoustic and electrical noise determine the detection limit, and large echoes from nearby scatterers, where linearity matters. With a coarse gain step of 18 dB, the dynamic range at the output of the LNA is compressed to approximately 58 dB. This is further interpolated by the PGA, which provides programmable gains stepping from 6 dB to 24 dB with a resolution of 6 dB. Thus, the signal dynamic range at the output of the AFE is the instantaneous dynamic range plus the TGC gain step, i.e. 46 dB, ranging from 4 mVpp to 800 mVpp. This signal-conditioning effectively reduces the noise and dynamic range requirements of the succeeding circuits, thus bringing significant power and area advantages for the whole system.

5.2.3 Beamforming ADC

The beamformer in each subarray is constructed in a way similar to the designs described in Chapter 3-4. Analog delay lines based on switched-capacitor memory cells are employed because of their simplicity and flexibility in delay control, as well as the good immunity to PVT variations [9, 21, 22]. Each delay line consists of 8 memory cells operating in a time-interleaved fashion at a sampling rate of 30 MHz, corresponding to a delay resolution of 3.3 ns and a maximum range of ~233 ns.

As discussed in Section 5.2.1, the signal dynamic range at the output of each AFE channel is estimated as 46 dB. Considering the extra 9.5 dB SNR gain provided by a 9-channel beamformer, a 10-bit ADC is required at the beamformer output to achieve an adequate quantization resolution. On the other hand, for an ultrasound transducer with a ≥ 50% fractional bandwidth, the ADC sampling rate must be 4 to 10 times the transducer central frequency to maintain a satisfactory side-lobe level [23]. Therefore, in case of a 5-MHz array, an appropriate sampling rate ranges from 20 MS/s to 50 MS/s. Given these specifications, a successive approximation register (SAR) ADC stands out as the architecture choice for its superior power-efficiency [24].
Most SAR ADCs perform the quantization in the voltage domain [25, 26], while the delay-and-sum function of switched-capacitor-based (S/H) delay lines is essentially carried out in the charge-domain. Therefore, intermediate amplifiers are indispensable for performing a charge-to-voltage conversion to drive the ADC. For instance, an active summing amplifier was used in [15] to sum the charges stored on the delay-line capacitors $C_M$ on a feedback capacitor, while an extra voltage buffer drives the ADC (Figure 5.3a). To implement a unity voltage gain, the feedback capacitor must be of the equal size as the total memory capacitance involved in each cycle, i.e. $N \times C_M$, leading to a considerable power consumption in the amplifier. In [15], the summing amplifier consumes more than $10 \times$ power of the ADC. The summing amplifier can be eliminated by adopting the passive charge-summation scheme [6], as shown in Figure 5.3b.
However, the need of an explicit ADC driver is still problematic. For SAR ADCs operating in the target performance regime, the power dissipated by the input driver is usually comparable to that of the ADC itself, as the combined result of

Figure 5.4. The proposed beamforming ADC and the operation timing diagram.
the relatively large input capacitance and the tightly constrained input sampling time [27].

To eliminate the need for the costly ADC driver, we propose to perform the digitization in the charge domain, rather than in the voltage domain. This is achieved by sequentially neutralizing the passively-summed charge with binary-scaled charge references through a successive approximation process. In practice, the charge references can be realized as a pre-charged capacitor DAC array. By doing so, the beamformer and the digitizer are essentially merged together, where the S/H delay lines perform as a multichannel time-interleaved input sampler in a charge-sharing SAR ADC [28]. We will refer to this circuit as a beamforming ADC.

Figure 5.4 shows a block diagram of the proposed beamforming ADC and its timing diagram. Both the beamforming and the digitization are performed differentially to mitigate the impact of common-mode noise and interference. In each channel, the PGA converts the single-ended LNA output to a differential voltage, which is cyclically sampled and held on memory cells under the control of non-overlapped sampling clocks, $S<1:8>$. The charge packets sampled on the memory cells are then released to the summing nodes, $V_{XP}$ and $V_{XN}$, upon the rising edges of channel-specific readout clocks, $R_k<1:8>$, where $k$ ranges from 1 to 9. The delay of a channel is then defined by the time-interval between the falling edges of its sampling and readout clocks. In each readout phase, the successive-approximation charge-balancing process starts following a short time interval reserved for the passive charge redistribution on the joint memory cells. In every bit-cycle, a dynamic comparator detects the sign of the differential voltage on the summing nodes ($V_{XP}$–$V_{XN}$) and dictates the polarity of the charge reference for use in the next iteration. To obviate the need for distributing an oversampled clock, self-timed SAR logic [29] is employed. By the end of the readout phase, a digital representation of the delay-and-summed charge is available. To simplify the output routing, the differential outputs of the dynamic comparator (CPout+/−) are buffered and transmitted to the periphery of the ASIC, where the 10-bit parallel data is successively recovered and synchronized to a high-speed system clock for further processing.

Upon completion of the charge-to-digital conversion, the summing nodes ($V_{XP}$ and $V_{XN}$) are shorted together to clear the residue charge on the parasitic capacitors, as shown in Figure 5.4 (CPRST). This operation eliminates the
memory effect in the passive charge-summation scheme that causes undesired signal attenuation [1], without introducing extra complexity. It also makes it possible to calibrate the comparator offset in the foreground, as will be discussed in Section 5.3.

### 5.2.4 Datalink

Figure 5.5 shows the detailed block diagram of the datalink circuit. Each subarray ADC is interfaced with a clock-data-recovery (CDR) circuit. It converts the differential return-to-zero (RZ) outputs of the SAR comparator to a single-bit non-return-to-zero (NRZ) data-stream, and extracts an asynchronous clock that is aligned with the recovered data. The data-stream is then synchronized to a 300 MHz global clock in a dual-clock FIFO to facilitate the subsequent data processing.

![Datalink architecture](image)
Before the serialization, every two consecutive 4-bit-wide FIFO outputs are concatenated as an 8-bit byte in a 4b-to-8b converter, which is then mapped to a 10-bit code in an 8b/10b encoder [30]. Such a coding scheme guarantees sufficient state transitions in the data-stream and permits reasonable clock recovery at the system side, thus obviating the need for a per-channel clock line in parallel with the fan-out data. Moreover, it ensures a DC-balance in the data-stream, which helps both data recovery and error detection at the system side. The encoded 10-bit data is then serialized to a single-bit high-speed data stream running at 1.5 Gb/s, which is buffered by a LVDS driver and transmitted to the imaging system.

The ASIC receives a 300 MHz global LVDS clock from the system, which is converted to CMOS logic levels at the periphery of the ASIC. Here, it serves as the main clock for the core of the datalink, and gets a $10 \times$ multiplication in a delay-locked-loop (DLL) to generate the clock phases needed for the serialization. On the other hand, a 10:1 clock divider down converts the main clock to 30 MHz, producing the beamforming clock ($CLK_{BF}$), which is distributed across subarray receivers via a balanced clock tree.

### 5.3 Circuit Implementation

#### 5.3.1 AFE

The LNA in each AFE channel is an improved version of the design described in Chapter 2 and Chapter 4. It is implemented as a single-ended capacitive feedback amplifier with a split capacitor network to achieve a compact layout, and consumes 75 µA.

The PGA implements three functions: 1) providing 4 fine gain steps to define the TGC gain resolution; 2) converting the single-ended LNA output to differential signals to drive the delay lines; 3) low-pass filtering prior to sampling to minimize aliasing. Figure 5.6 shows a complete schematic of the PGA. A programmable capacitor network provides the desired gain levels ranging from 6 dB to 24 dB according to the control code map shown in Figure 5.6. To save area, a T-type capacitor network [31], employing unit capacitors of 33 fF, is used as the feedback element across a compact differential telescopic amplifier. Each PGA consumes 100 µA.
The PGA cell drives a delay line that consists of 8 stages of time-interleaved differential S/H memory cells. Each cell comprises a pair of grounded metal-insulator-metal (MIM) capacitors, which are sized as large as possible (133 fF) within the available area to minimize the kT/C noise contribution to the front-end. As discussed in Chapter 4, the mismatch of the S/H memory cells would introduce a ripple pattern with a period of $M/f_s$, where $M = 8$ is the number of delay steps and $f_s$ is the sampling frequency. To mitigate this interference, the mismatch-scrambling technique proposed in Chapter 4 is equally applicable to this work. As an alternative, however, since the beamformer outputs are digitized synchronously to the same system clock that controls the beamformer, the ripple patterns at different delay settings are be pre-recorded and stored in memory, and then subtracted from the outputs during the normal receive phase. This calibration process is realized off-chip in the back-end digital processing unit (FPGA).
approach takes advantage of the integrated beamforming ADC, and thus not only saves area but also prevents adding the excess noise associated with the mismatch-scrambling technique [1], at the cost of an increased complexity in the back-end signal processing.

5.3.2 Charge-reference Generation

The generation and distribution of references for a massively-parallel ADC array is challenging. In prior implementations of charge-sharing SAR ADCs [28, 32], the DAC capacitors are precharged by an external voltage source before the start of each quantization. For a 10-bit SAR ADC running at 20-50 MS/s, the available time-slot for precharging is rather short, typically a few nanoseconds, while the total DAC capacitance is beyond 1 pF to ensure adequate matching. When hundreds or thousands of such ADCs are integrated together for a 2-D transducer, the peak current drawn from the external reference source can easily exceed 100 mA. This will readily introduce significant $di/dt$ transients that degrade the ADC performance. The use of large on-chip decoupling capacitors can alleviate this problem, but is not preferred here given the constrained area. An alternative solution is to implement a reference voltage buffer within each subarray, which, however, inevitably introduces a significant power overhead [33].

In this work, we propose to precharge the CDAC with current sources to relax the power and area requirements for charge reference generation. Unlike the approach described in [33], the current source is locally generated in each subarray and self-calibrated during the TX phase in reference to an external voltage ($V_{REF}$). As such, the current source non-linearity and its poor immunity against PVT variations are both mitigated with a low cost in hardware, obviating the need for expensive individually back-end digital calibration. In addition, it also simplifies the system-level layout, as no global current reference distribution or high-current voltage reference routing is required.

Figure 5.7 shows the schematic of the charge-reference generator and its timing diagram. It consists of a gated current source ($M_I$), a charge-pump and a calibration comparator.

Intuitively, the charge reference generated by a gated current source can be written as:

$$Q_{REF} = I_p \cdot T_{int}$$

(5.1)
where $I_P$ is the precharging current and $T_{int}$ is the precharging duration defined on-chip. It is, however, difficult to maintain the uniformity of $Q_{REF}$ across the whole array, since both $I_P$ and $T_{int}$ have poor immunity to process variations and mismatch. Therefore, we define the charge reference as:

$$Q_{REF} = C_{DAC} \cdot V_{REF}$$

where $C_{DAC}$ is the total capacitance of the DAC array and $V_{REF}$ is the desired voltage reference with respect to the AFE output. As the absolute value of capacitors in modern CMOS processes is typically more strictly controlled [34], calibration is applied to $I_P$ so that the voltage settled on $C_{DAC}$ ($V_{DAC}$) after precharging approaches $V_{REF}$. This is accomplished by introducing a calibration phase in synchronization with the TX phase, during which the digitization is not required. During this phase, the charge-pump and the calibration comparator are periodically enabled in a short time interval (CAL) following each precharging. The calibration comparator compares $V_{DAC}$ to $V_{REF}$, and according to the comparison result a unit charge packet is pumped in or pulled out from a MOS memory capacitor ($C_{MOS}$) at the gate of $M_P$ to adjust the value of $I_P$ for the next cycle. The size of the charge packet, which dictates the reference calibration resolution (LSB$_{CAL}$), is set by both the pulse duration time and the magnitude of the sourcing/sinking current in the charge pump. The above-described process repeats for a defined number of cycles, at the end of which the error between $V_{DAC}$ and $V_{REF}$ becomes less than ±1 LSB$_{CAL}$.

During the RX phase, both the charge pump and the calibration comparator are disabled, and the gated current source precharges $C_{DAC}$ based on the bias voltage stored on $C_{MOS}$. The broadband white noise of the charge pump is sampled on $C_{MOS}$ and held constant throughout the RX phase, and therefore it does not contribute any in-band noise to the established charge reference.

Both the precharging current noise and the jitter of the precharging duration ($T_{int}$) lead to noise charge sampled on $C_{DAC}$. The sampled voltage noise power owing to the precharging current noise can be described as:

$$v_{n,\text{current}}^2 = \frac{kT}{C_{DAC}} g_n R_O \left[ 1 - e^{-2T_{int}/R_n C_{DAC}} \right] \approx \frac{2kT g_n T_{int}}{C_{DAC}^2}$$

(5.3)
where $R_O$ is the output impedance of the current source and $g_n$ is its equivalent thermal noise transconductance. Since $g_n$ is proportional to the transconductance of the current source, we have:

$$g_n \sim I_p \approx \frac{V_{REF} C_{DAC}}{T_{int}}$$

(5.4)

Therefore, the integrated reference noise due to the current noise can be estimated by the following expression:

$$\overline{V^2_{n,\text{current}}} \sim 2kTC_{DAC}^{\frac{3}{2}} \sqrt{K(W_p/L_p)V_{REF}T_{int}}$$

(5.5)

where $K$ is a process constant and $W_p, L_p$ are the dimensions of $M_P$. On the other hand, the jitter noise can be estimated as:

$$\overline{V^2_{n,jitter}} \sim \left(\frac{V_{REF}}{T_{int}} \cdot t_j\right)^2 = \frac{V^2_{REF} t_j^2}{T_{int}}$$

(5.6)

where $t_j$ is the absolute random jitter of the precharging duration time. In this work, $\overline{V^2_{n,\text{current}}}$ is minimized by making $W_p/L_p \ll 1$, while $\overline{V^2_{n,jitter}}$ dominates the reference noise, leaving jitter of the precharging clock as the dominant noise source. The precharging clock is derived from the ASIC input clock, whose jitter performance is therefore crucial. Maximizing $T_{int}$ helps in relaxing the jitter requirements. To do so, we use a ping-pong charge reference that consists of two identical capacitor DAC arrays, as shown in Figure 5.7. A duration time of 25 ns (5/6 of the sampling clock period) is allocated for $T_{int}$, permitting the use of a system clock with moderate jitter ($\sim 20$ ps).

In the calibration phase, only one DAC array is connected to the gated current source, while in the RX phase they are alternately used for precharging and conversions. By sharing the current source for precharging and the timing logic for generating $T_{int}$, the ping-pong charge reference is free from interleaving spurs caused by the DAC capacitance mismatch.

The topology of each DAC array is similar to [32]. The charge references corresponding to the first 7 MSBs are produced by precharging a bank of binary-scaled capacitors, while those correspond to the last 3 LSBs are generated using charge redistribution. Metal-oxide-metal (MOM) capacitors with symmetric plate parasitic capacitance are utilized as unit capacitors to minimize the differential capacitance mismatch. The unit capacitor is sized as 23 fF to ensure
adequate matching for a 10-bit linearity. In total, 67 unit capacitors are used in each DAC, leading to a total capacitance of about 1.5 pF.

The required reference voltage is determined by the following charge-balancing equation:

$$N \cdot V_{in,max} \cdot C_M = 4 \cdot V_{REF} \cdot C_{DAC}$$  \hspace{1cm} (5.7)

where $N$ is the number of subarray elements, $V_{in,max}$ is the maximum differential PGA output swing and $C_M$ is the capacitance of a unit memory cell in a delay line of the beamformer. For $N = 9$, $V_{in,max} = 800$ mV, $C_M = 133$ fF and $C_{DAC} \approx 1.5$ pF, $V_{REF}$ is approximately 160 mV.

Figure 5.7. (a) Schematic of the charge reference generator and (b) its timing details.
The calibration comparator is implemented as a class-A preamplifier followed by a StrongArm latch, as shown in Figure 5.8. The preamplifier consists of two stages of resistive-load differential pairs in cascade to warrant a sufficient gain, while the sizing of the first stage is optimized for minimizing both noise and offset. On account of the relatively low reference voltage, pMOS input transistors are chosen for the preamplifier. The calibration comparator is powered down during the RX phase, hence its contribution to the overall power consumption is negligible.

### 5.3.3 SAR Logic

For an implementation in 0.18 µm CMOS process, the high-speed SAR logic dominates the power consumption of the ADC. Figure 5.9a and 5.9c show the schematic and the timing diagram of the proposed asynchronous SAR logic. The differential outputs of the dynamic comparator (CPout+/−) directly trigger a 10-stage differential shift-register, where each stage consists of a pair of D-flip-flops (DFF) with an enable input. Compared to conventional implementations [25, 26], the proposed scheme minimizes the time delay between the comparator decision and the DAC switching, thus relaxing the timing for charge sharing. During the conversion, each DFF pair reads the comparator decision by sensing the rising edges of the comparator outputs. Once a rising edge on either side is detected, the DFF pair is immediately disabled and no longer responsive to succeeding
comparison events until the end of the cycle. The data is then captured and frozen to control the switching of corresponding DAC elements. To identify the completion of a conversion, the enable signal of the LSB DFF is used as Data_ready signal. An additional DFF stage is used for comparator offset calibration, as will be discussed in Section 5.3.4.

To further reduce the dynamic power consumption, each DFF pair is kept deactivated until the preceding stage has come to a decision. Thus, in every bit-cycle only one DFF pair is clocked for data reading. This is achieved by embedding a local clock-gating buffer within each DFF cell, which defines a bit-wise timing window based on the outputs of the previous and the present stages (Figure 5.9c). The clock-gating buffer is implemented as a dynamic NAND gate followed by a simple latch with a weak feedback inverter [35], as shown in Figure 5.9b. To prevent undesired switching events, the output of the previous stage is properly delayed to guarantee that its rising edge always arrives during the reset phase of the dynamic comparator. Simulation results indicate a 33% dynamic
power reduction to the proposed SAR logic thanks to the adoption of this clock-gating scheme.

5.3.4 Dynamic Comparator

An inherent limitation of the charge-sharing SAR conversion is the discrepancy between the charge-domain signal approximation and the voltage-domain quantization, which leads to more stringent requirements on the input-referred noise and offset of the dynamic comparator [32]. Figure 5.10 shows the schematic of the dynamic comparator, the core of which is a double-rail latch-type voltage sense amplifier [36]. Its first stage is properly sized to ensure a sufficiently low input-referred noise. To avoid the need for individual offset trimming for each subarray, the offset is self-calibrated in a way similar to [37], which involves a charge pump and an auxiliary input pair with one gate connecting to an external calibration voltage. A self-timing circuit takes the comparator outputs and the Data_ready signal to generate an oversampling clock that schedules the evaluation and reset of the dynamic comparator.

![Figure 5.10: Dynamic comparator with offset calibration.](image)

The comparator offset is dependent on the input common-mode voltage [26] during charge sharing, which, in turn, depends on the parasitic capacitance at the...
summing nodes and therefore varies between subarrays. To avoid the need for individual offset trimming for each subarray, the offset is self-calibrated in a way similar to [37], which involves a charge pump and an auxiliary input pair with one gate connecting to an external calibration voltage. In contrast with [37], the offset calibration is performed in the background while the SAR conversion is on-going. As described in Section 5.2.3, the comparator input nodes are shorted to clear the residue charge at the end of each conversion, resulting in a voltage that is close to the common-mode voltage during the LSB cycles. Therefore, by triggering one more comparison, the polarity of the offset can be detected, allowing the charge pump to adjust the bias voltage of the auxiliary input pair. This additional comparison is realized by adding an extra stage in the asynchronous SAR logic, as shown in Figure 5.9. By repeating this process for successive SAR conversions, the offset voltage is progressively minimized within a finite number of ADC cycles. The background calibration can be disabled by nulling the input of the extra logic stage.

Because of the relatively low reference voltage for DAC precharging, the input common-mode of the comparator slightly decreases as the SAR conversion proceeds, which leads to a bit-dependent dynamic offset. However, since the input common-mode variation is only a small portion of the final voltage used for calibration, the resulting offset variation and dynamic error charge have a negligible impact on the linearity.

5.3.5 CDR and FIFO

![Figure 5.11. Schematic of the clock-data-recovery (CDR) circuit.](image)

The differential comparator outputs from each subarray are received by CDR circuits at the ASIC periphery, which reconstruct the serial ADC output and a corresponding asynchronous clock (Figure 5.11). Since the comparator outputs
are in RZ format, the clock can be reconstructed from an “OR” operation of the two outputs. A DFF driven by this clock reconstructs the serial ADC data. The DFF has a constant logic high input and is reset by the negative comparator output (CPout-). Proper delays are added to enhance the circuit’s immunity to timing uncertainties.

The recovered clock and data are fed to a dual-clock FIFO for further synchronization. The “read” operation of the FIFO is driven by the 300 MHz global clock. In order to simplify the data reconstruction at the system side, once a valid data stream is received, the FIFO is expected to operate in neither “empty” state nor “full” state. The “full” state is avoided by selecting a FIFO queue-depth of 16, more than the 10 bits of a single conversion result. To avoid the “empty” state, a 5-cycle delay is applied between the start of the “write” operation and the start of the “read” operation to make sure that enough data is written to the FIFO before reading. The FIFO and the following encoders were implemented and optimized using logic synthesis tools. To enable a bit-error-rate (BER) test for the

Figure 5.12. Delay-lock-loop
high-speed data exportation, the FIFO inputs can be switched to the output of an on-chip pseudo-random bit sequence (PRBS) generator with a sequence length of $2^{16}-1$.

5.3.6 DLL

Figure 5.12 shows a block diagram of the DLL, which is based on [38]. It consists of a 5-stage differential voltage-controlled delay line (VCDL), a phase detector, a charge-pump and an edge combiner. The delay cell in the VCDL is implemented by cascading two differential cross-coupled inverters, the first of which is loaded by RC branches consisting of an NMOS switch and a MOS capacitor. By increasing the switch control voltage $V_{CTRL}$, more capacitance is added to the inverter’s load, thus increasing the delay. The VCDL is powered by a separate 1.2V supply. Once the loop is stable, the edge combiner receives the outputs of the delay cells and generates five consecutive equal-width pulses, which are fed into a 10:1 multiplexer for data serialization.

5.4 Experimental Results

The ASIC has been fabricated in a 0.18 µm 1P6M low-voltage CMOS process and has an area of $4.8 \times 2.5$ mm$^2$, as shown in Figure 5.13a. The floor plan of a 3 × 3 subarray receiver is shown in Figure 5.13b, while its power and area breakdown are shown in Figure 5.14. The bondpads for transducer interconnection are implemented in the top (6th) metal layer, while the 5th metal layer is reserved as a grounded shield to protect the LNA inputs from digital interference. While receiving, each subarray receiver consumes 4.3 mW, corresponding to 0.46 mW/element. The beamforming ADC along with the delay programming logic occupies about half of the subarray area, while consuming only 36% of the subarray power (1.58 mW). The total power consumption of the ASIC including the datalink and LVDS drivers is 130.5 mW, corresponding to 0.91 mW/element.

Figure 5.13c shows a fabricated prototype with an integrated $24 \times 9$ PZT matrix transducer using the approach described in Chapter 2. It is wire-bonded to a daughter PCB for both electrical and acoustic tests. The daughter PCB is mounted
on a custom-designed mother PCB, where an FPGA receives and buffers the high-speed RF data before transmitting it to a PC for image reconstruction.

5.4.1 Electrical Measurements

The electrical performance of the prototype ASIC has been characterized by wire-bonding test input signals to the selected RX transducer pads. The reconstructed digital outputs of each subarray receiver are converted back to a voltage signal according to equation (5.7) to facilitate the performance evaluation.

Figure 5.15 shows the measured subarray receiver transfer function at 12 AFE gain settings. It achieves an overall mid-band gain range of 49 dB, stepping from -7 dB to 42 dB with an average step size of 4.5 dB. The deviation from the ideal

![Chip microphotographs.

**Figure 5.13.** Chip microphotographs.

![Power and area breakdown of one 3×3 subarray receiver.

**Figure 5.14.** Power and area breakdown of one 3×3 subarray receiver.
gain step (6 dB) is mainly caused by the insufficient open-loop gain of the PGA core amplifier at high gain modes. The average -3 dB bandwidth of the subarray receiver is measured as 11.9 MHz.

Figure 5.16 shows the measured subarray input-referred voltage noise spectrum at the highest AFE gain setting, which indicates an input-referred voltage noise density of 6.3 nV/√Hz at 5 MHz. Before applying the digital back-end calibration (Section 5.3.1), the ripple pattern introduced by delay line mismatches appears as in-band interference tones at \( f_s/8 \) (3.75 MHz) and its harmonics. By subtracting the pre-recorded ripple pattern (obtained from 100 iterations with grounded inputs) from the output signal, these interference tones get significantly reduced from the spectrum without deteriorating the noise floor.

Figure 5.17 shows the measured output spectrum of one subarray receiver at the highest AFE gain setting with a 4.95 MHz sinusoidal test input. It achieves a peak
SNDR of 51.8 dB within an 80% bandwidth (3 MHz to 7 MHz) around the center frequency (5 MHz), where the AFE dominates the noise floor.

Figure 5.18 shows the transient response of one subarray output with the proposed background comparator offset calibration enabled. Upon initialization, the offset settling process takes about 120 ADC cycles (~ 4 µs) to converge. Without calibration, the original output offset of the tested subarray (also shown in Figure 5.18) is as about 6 LSB, which is reduced to -1 LSB after calibration.

The high-speed datalink has been evaluated separately using the on-chip PRBS generator, which shows a BER better than $10^{-9}$ across 1 m coaxial cables. To better demonstrate the channel-reduction capability of the datalink, we programmed 4 subarrays that share the same high-speed data output channel with different uniform delays (30 ns, 90 ns, 150 ns and 210 ns). A 3-cycle sinusoidal signal is chosen as the common input to these subarrays, with a frequency of 2 MHz so as to better illustrate the relative time delay. Figure 5.19 depicts the reconstructed time-domain output waveform of these 4 subarrays, recovered from the shared LVDS output port, which clearly shows the expected relative time delays. The worst-case inter-subarray crosstalk is measured as -57 dBc.
Table-5.1 summarizes the electrical performance and compares the work with prior digitization solutions for 3-D ultrasound imaging systems. Based on the table, this work achieves a $10 \times$ improvement in power efficiency, and a $3.3 \times$ higher integration density. When compared with our previous analog output receiver ASIC as presented in Chapter 4, the subarray digitization only costs
about 70% extra power and is realized within the same die size. On the other hand, the high-speed datalink introduces a non-negligible power overhead due to the relatively large feature size of the chosen technology. This, however, can be reduced by adopting a more advanced CMOS technology.

5.4.2 Acoustic Measurements

The acoustic performance of the fabricated prototype has been characterized by mounting a waterbag on top of the PZT-on-ASIC assembly, as shown in Figure 5.20a. A 3-needle phantom was immersed in water and placed at about 20 mm in front of the PZT matrix. A diverging wave was transmitted from the prototype by driving 6 elements at the center of the TX subarray (Figure 5.2) using 20 V (peak-to-peak) 3-cycle pulses. In several successive TX-RX cycles, the 16 subarrays in the prototype were steered to different angles to scan the volume.

Figure 5.20b shows the recorded digital outputs of one subarray receiver with different programmed steering angles at the lateral direction. It clearly shows an increase of the echo amplitude when the subarray beamformer is steered towards the specific needle.

Figure 5.21 illustrates a reconstructed B-mode image in the lateral direction. It is obtained by recording and combing the digital outputs of all subarrays, and performing the post-beamforming computation in software. The positions of all
3 needles are clearly shown in the image with a spatial resolution in line with the relatively small RX aperture.

The image was reconstructed from 25 beams (T/R cycles) with a PRF of 5 kHz, leading to a theoretical volume rate of 200 volume/second. In practice, however, the imaging rate is limited by the data transfer speed between the FPGA and the PC, as well as the software post-beamforming computing time. This constraint

<table>
<thead>
<tr>
<th>TABLE-5.1. Comparison to Prior Work</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Process</strong></td>
</tr>
<tr>
<td><strong>Transducer</strong></td>
</tr>
<tr>
<td><strong>RX Array Size</strong></td>
</tr>
<tr>
<td><strong>No. of Channels</strong></td>
</tr>
<tr>
<td><strong>Center freq.</strong></td>
</tr>
<tr>
<td><strong>RX Architecture</strong></td>
</tr>
<tr>
<td><strong>ADC Architecture</strong></td>
</tr>
<tr>
<td><strong>Nyquist rate</strong></td>
</tr>
<tr>
<td><strong>Pitched-Matched</strong></td>
</tr>
<tr>
<td><strong>Channel Reduction</strong></td>
</tr>
<tr>
<td><strong>Integration Method</strong></td>
</tr>
<tr>
<td><strong>Delay Resolution</strong></td>
</tr>
<tr>
<td><strong>Active Area/el.</strong></td>
</tr>
<tr>
<td><strong>RX Power/el.</strong></td>
</tr>
<tr>
<td><strong>Peak SNDR</strong></td>
</tr>
</tbody>
</table>

* Including the datalink and LVDS drivers.
** ADC only, excluding the analog front-end.
*** Measured at 64-channel beamformer output.
could be resolved by migrating the image construction function to the FPGA [39], or implementing it in a digital ASIC [13].

5.5 Conclusions

We have presented a front-end ASIC that enables power- and area-efficient in-probe digitization for next-generation endoscopic and cathether-based 3-D ultrasound imaging systems. It employs a low-power beamforming ADC to realize an additional 4-fold channel count-reduction compared to prior analog subarray beamformer designs. The ADC directly digitizes the subarray beamformer output in the charge domain to eliminate the need for intermediate buffers, resulting in significant reduction in power consumption and silicon area. Self-calibrated charge-references are proposed to further optimize the power-efficiency as well as facilitate the system-level design. The ASIC achieves an overall 36-fold channel-count reduction and a state-of-the-art power-efficiency with less than 1 mW/element power dissipation in receive, which is acceptable.

Figure 5.20. (a) Setup for imaging experiments. (b) Recorded subarray output waveforms with different lateral steering angles.
Front-End ASIC With Integrated Subarray Beamforming ADC

even when scaled up to a 1000-element probe. A fabricated prototype with integrated transducer has been successfully applied in 3-D imaging experiments.

References


CHAPTER 6

CONCLUSIONS

This final chapter summarizes the main contributions of this thesis, followed by a statement of the main findings. It also includes a special section giving a vision for the future improvements, aiming at both circuit-level improvements and system-level extensions.

6.1 Main Contributions

- **Transducer-oriented ultrasound front-end amplifier design optimization (Chapter 2).**

  We have analyzed and compared the noise-power trade-off of different front-end amplifier architectures, based on the electrical impedance characteristics of the target ultrasound transducer, and proposed a design optimization approach. The effectiveness of the analysis and the design approach has been verified by two case studies.

- **Integration of a PZT matrix with a front-end ASICs (Chapter 3).**

  We have successfully demonstrated the feasibility of integrating a fine-pitch PZT matrix on top of a CMOS-based front-end ASIC. The acoustical properties and the electrical response of the PZT-on-CMOS assembly has also been also evaluated.

- **Implementation of a 1000+ element front-end ASIC with record power-efficiency (Chapter 4).**
We have successfully demonstrated a front-end ASIC integrated with a 32 × 32 PZT matrix. The ASIC employs subarray beamforming to achieve a 9-fold channel reduction. By optimizing the circuit implementations, the ASIC achieves a record power efficiency (0.27 mW/element) that is 5× better than the state-of-the-art, while having a compact pitch-matched layout.

- **Implementation of a power-efficient full-matrix element-matched digital receiver (Chapter 5).**

We have successfully demonstrated a front-end ASIC integrated with a 2-D PZT matrix that enables in-probe digitization with 10× better power efficiency and 3.3× higher integration density than the state-of-the-art. A charge-domain beamforming ADC is proposed to perform digitization at subarray beamformer outputs without the need for explicit input drivers and reference buffers, thus achieving power and area savings.

### 6.2 Main Findings

**System-level:**

- Direct integration of a 2-D ultrasound transducer and a front-end ASIC address the dense interconnection challenge associated with miniature 3-D ultrasound probes. It requires both a careful investigation of the physical (i.e. mechanical, thermal and acoustical) properties of the transducer itself, and a dedicated design procedure in the ASIC for optimizing the interconnect performance (Chapter 2).

- Subarray beamforming is an enabling technique for real-time 3-D ultrasound imaging, and an effective approach for channel-reduction in miniature 3-D ultrasound probes. By dividing a large 2-D array into subgroups with proper sizes and applying the time delay in distributed steps with the aid of in-probe integrated circuits, implementing a fully-sampled matrix transducer with 1000+ elements becomes possible (Chapter 3, 4, 5).

---

10 This conclusion has also been drawn in the previous work [1][6].
System-level power and area optimization requires a co-design of all circuit building blocks (Chapter 5). An elaborate arrangement of dynamic range transition along the signal readout path not only helps in achieving an optimal noise-power trade-off, but also contributes to the overall area reduction (Chapter 4, 5).

Cable-reduction for ultrasound probes can be achieved not only by compressing the data volume in the probe, but also by making better use of the cable capacity in both time and bandwidth (Chapter 5).

The main significance of in-probe digitization is bringing the strength of digital signal processing and high-speed datalink into the development of next-generation miniature 3-D ultrasound probes (Chapter 5).

**Circuit-level:**

- The optimal LNA architecture choice is dictated by the transducer impedance characteristic (Chapter 2);

- Given the single-ended nature of ultrasound transducers, a single-ended LNA yields the best power efficiency. However, special design concern should be given to protect the LNA from supply-line disturbance, such as supply noise and ground bounce during the transmit phase. An adequately high PSRR for both power rails (supply and ground) is a necessity for ultrasound LNAs (Chapter 2, 3, 4).

- Dynamic biasing synchronized to the system’s transmit/receive cycles (Chapter 2, 4, 5) is an effective and efficient approach for reducing both noise and power in ultrasound receive circuits.

- The main difficulty in realizing in-probe digitization for 3-D ultrasound transducer is not in the design of the ADC core circuits. Instead, the key challenge lies in the implementation of ‘auxiliary’ building blocks, such as input drivers and reference generators, and in the chip-level layout. Circuit innovations are required to efficiently ensure the proper operation of a massively-parallel ADC array, rather than a single ADC (Chapter 5).

- The dynamic range of time-interleaving switched-capacitor delay lines is limited by the periodical ripple pattern owing to mismatch of the memory cells. Such interference can be mitigated by applying the mismatch-scrambling technique with a minimum hardware cost, which randomizes the
running sequence of the memory cells and converts the ripple pattern into broadband noise (Chapter 4). An alternative solution is to subtract the ripple pattern in the digital backend, which relies on pre-extraction of the ripple pattern during foreground calibration (Chapter 5). It may however result in degradation of the frame rate when dynamic focusing/steering is required.

- Charge-domain summation is inherently compatible with time-interleaving switched-capacitor delay lines. The residue charge on the summing node parasitic capacitance leads to undesired low-pass filtering effects, which can be eliminated by introducing periodical reset phases followed by resampling (Chapter 3), or simply mitigated by minimizing the parasitic capacitance by layout optimization for system simplicity (Chapter 4). When performing direct charge-domain digitization at the summing nodes, this issue is conveniently mitigated by the inherent reset operation of the ADC without extra costs (Chapter 5).

- By merging the beamforming and digitization functions in the charge-domain, a beamforming ADC eliminates the need for explicit input drivers, leading to substantial reduction of power and area (Chapter 5).

6.3 Future work

- Further optimization of the switched-capacitor-based beamformer and beamforming ADC.

As explained in Chapter 4 and Chapter 5, although the mismatching-scrambling technique removes the ripple pattern tones from the spectrum, it also leads to an elevated noise floor. A background switched-capacitor ripple-reduction-loop (SC-RRL) can be introduced to eliminate the periodic offset pattern from its origin, but novel circuit techniques are called for to minimize the associated power and area overhead. The beamforming ADC architecture described in Chapter 5 can be leveraged to address this challenge. It facilitates the implementation of a digital SC-RRL, which allows the storage of offset patterns in low-cost digital memory, as well as reuse of the ADC infrastructure (e.g. the charge-reference CDAC) for dynamic offset calibration. Furthermore, by dynamically tuning the charge-reference during precharging, it is convenient to embed (part of) the time-gain-compensation functionality within the beamforming ADC, thus further
simplifying the system. To serve this purpose, a dynamic comparator architecture with programmable noise performance should be first investigated to avoid undesired degradation of the power efficiency.

- **Incorporating compact transmit pulsers with existing receivers to enable a full transceiver aperture.**

As described in Chapter 4, the transducer matrix configuration proposed in this work enables an ultra-compact chip layout by splitting transmit/receive subarrays. This is, however, at the expense of a relatively small transmit aperture, which limits the use of transmit focusing. The realization of a full transmit/receive aperture requires a compact integrated transceiver with pitch-matched layout. Recent and future advances in semiconductor manufacturing and integration technology may help to address this challenge. As high-voltage (HV) and silicon-on-insulator (SOI) processes at lower technology nodes (< 90 nm) are getting mature, the required space for HV isolation in the silicon substrate is expected to further shrink. Meanwhile, the integration density can be increased in the other dimension with the aid of 3-D chip stacking based on through-silicon-vias (TSVs) [2].

- **Digital-enhanced cable reduction.**

Power-efficient in-probe digitization in this work (Chapter 5) will open up a new realm for data compression and cable reduction in medical ultrasound. As explained in Chapter 5 and Chapter 6.1, compact and power-efficient digital signal processing in deep-submicron technologies can make a variety of powerful data and image compression techniques applicable for the battle against the massive cable connections between ultrasound probes and the imaging system.

The future possibilities, however, are not limited to that. Innovations in other technology fields may also contribute to enhancing the data transmission capacity. Optical links [3] are capable to convey the full digitized dataset from a 1000+ element ultrasound probe with a single fiber cable, which has a much higher bandwidth than a coaxial copper cable. Recent research has reported the feasibility of integrating vertical-cavity surface-emitting lasers (VCSELs) on a CMOS chip [4], paving the way towards optical communication between a digital 3-D ultrasound probe and the imaging system. Furthermore, the low-power digitization solution can also be combined with advanced ultra-wide-band (UWB) circuits to enable wireless
Conclusions

Data transmission. This is essential for a family of emerging medical ultrasound devices that are aiming at implantable and wearable applications, such as ultrasound capsules [5], pills and patches.

- **System-on-chip ultrasound imager.**

As mentioned in the introductory of this thesis, the miniaturization of ultrasound imaging devices, partially driven by the endless curiosity of engineers, never stops. The ultimate target of this process is to establish the entire ultrasound imaging system on silicon chips. The demonstrated feasibility of implementing efficient on-chip digitization in ultrasound ASICs, along with the foreseeable utilization of more advanced chip processing technologies and other technology innovations mentioned above, will make it possible in the near future to realize, and even to commercialize, a system-on-chip ultrasound imager which incorporates the majority of image processing functions and acts as a stand-alone device to deliver the image dataset to graphic user interfaces (GUIs) built in software. Such imager can be established by a single-chip, namely an ultrasound processing unit (UPU), or a chip-set that consists of an ultrasound front-end (UFE) and an image processing unit (IPU). From a social perspective, a fully-miniaturized ultrasound system down to a chip in combination with a well-connected healthcare database, will magnify the accessibility of ultrasound as a diagnostic tool for the mass population and increase the the medical inspection frequency, and finally transform human healthcare from diagnose and treatment to prediction and prevention.

Reference


This thesis describes the analysis, design and evaluation of front-end application-specific integrated circuits (ASICs) for 3-D medical ultrasound imaging, with the focus on the receive electronics. They are specifically designed for next-generation miniature 3-D ultrasound devices, such as transesophageal echocardiography (TEE), intracardiac echocardiography (ICE) and intravascular ultrasound (IVUS) probes. These probes, equipped with 2-D array transducers and thus the capability of volumetric visualization, are crucial for both accurate diagnosis and therapy guidance of cardiovascular diseases. However, their stringent size constraints, as well as the limited power budget, increase the difficulty in integrating in-probe electronics. The mismatch between the increasing number of transducer elements and the limited cable count that can be accommodated, also makes it challenging to acquire data from these probes. Front-end ASICs that are optimized in both system architecture and circuit-level implementation are proposed to tackle these problems.

Given the small form factor of miniature 3-D ultrasound probes, the 2-D array transducer must be directly mounted on top of the front-end ASIC, calling for a high-density interconnect scheme to establish the dense electrical connections between the transducer elements and the silicon chip. The direct integration of a 2-D PZT matrix with a CMOS chip, however, is limited by several physical constraints. The assembly process for PZT transducers requires a relatively low temperature to prevent de-polarization of the piezo-material. Moreover, a non-conductive mechanical buffer layer is required between the PZT matrix and the ASIC to protect the latter from the dicing process. To address these challenges, a direct PZT-on-CMOS integration scheme is developed in this work. It employs a composite interconnection layer, which consists of a layer of non-conductive epoxy with metallic channels, and a layer of conductive glue. The transducer-on-chip integration scheme not only helps in down-sizing the ultrasound probes, but also contributes to minimizing the parasitic capacitance added to each transducer element, thus enhancing the sensitivity to ultrasound signals.
The dynamic range of the detectable ultrasound signal is usually determined by the low-noise amplifier (LNA) in the receive circuitry, which directly interfaces with the transducer element. To achieve an input-referred noise that is sufficiently small compared to the transducer’s noise, the majority of the ultrasound receiver power is usually consumed by the LNA. To optimize the noise-power trade-off of the ultrasound receiver system, the characteristic of the target transducer must be taken into account, and the optimal choice of the LNA architecture is dictated by the electrical impedance and parasitic capacitance of the transducer element. A comprehensive analysis on the optimal architecture choice of ultrasound LNAs is given in this thesis as a guideline for designers.

Besides signal conditioning, another important task of the front-end ASIC is to enable in-probe channel-count reduction. Subarray receive beamforming is adopted in this work to reduce the number of cables by about an order of magnitude. Its basic principle is to divide the 2-D transducer array into properly sized subarrays, each corresponding to one subarray beamformer, and to divide the beamforming operation into two steps. The subarray beamformer only applies fine delays for the individual elements, while the coarse delays are applied in the external imaging system, so as to reduce the implementation complexity of the on-chip delay lines. In this work, subarray beamforming is accomplished in the analog domain to achieve superior power-efficiency. Further channel reduction is realized in the digital domain with the aid of compact subarray beamforming ADCs, which directly digitize the analog beamformer output and transports the bitstream to datalinks on the ASIC periphery. There, the outputs of each four beamforming ADCs are serialized to GHz high-speed data and exported to off-chip FPGAs, thus achieving an extra 4-fold channel-count reduction.

The analog beamformers are realized with pipelined-operated switched-capacitor delay lines for their outstanding power-efficiency, flexibility and good immunity to process/voltage/temperature (PVT) variations. To eliminate the need for a summing amplifier, the signal summation is performed in the charge-domain, rather than the voltage domain, by passively jointing the output of delay lines. The parasitic capacitance at the summing node should be carefully evaluated as it introduces an undesired low-pass filtering, which can be eliminated by periodical charge reset at the cost of an increased complexity. Based on this
architecture, a power- and area-efficient subarray beamforming ADC has also been realized in the charge-domain, with the aid of a charge-sharing successive-approximation (SAR) ADC. During quantization, the passively-summed signal charge is sequentially neutralized by binary-scaled charge references, which is implemented as precharged capacitor DAC arrays.

The switched-capacitor delay lines, however, suffer from charge injection and clock feed-through errors, the mismatch of which results in a ripple pattern with an in-band frequency at the beamformer output. This limits the dynamic range of an analog beamformer. Two different approaches are proposed in this work to mitigate this interference. The first approach, referred to as mismatch-scrambling, randomizes the operating sequence of the delay stages by adding an extra memory cell and the associate control logic. By doing so, the interfering tones in the spectrum are converted into broadband noise, at the cost of a slightly elevated noise floor. As an alternative, the ripple patterns can be pre-recorded and stored in digital memory, and then subtracted from the beamformer output during the normal receive operation. This back-end calibration relies on the integration of the subarray beamforming ADC, which digitizes the ripple patterns synchronously to the beamformer clock. The latter approach trades off the complexity in the back-end digital processing for a better noise performance in the analog front-end.

Power minimization is always the key challenge in the design of ultrasound front-end ASICs, and the fundamental of low-power analog design is to increase the efficiency of the current utilization. Based on this principle, a current-reuse transconductor based on a CMOS inverter is employed to enhance the current efficiency of the LNA. The single-end feature of the LNA, however, makes it challenging to appropriately bias the inverter. Several innovative circuit techniques are proposed to tackle this problem, including a split feedback capacitance network, and a dynamic feedback bias control. Similarly, a class-AB super source follower is adopted as the topology of the analog output cable driver to further save power.

An ultrasound front-end ASIC operates synchronously with the transmit/receive cycles of the ultrasound system. That is to say, the receiver circuits can be periodically switched to an idle mode during the transmit phase. Such feature is fully taken advantage of in this work to further improve the circuit performance.
For example, the static bias control in the analog front-end, such as the high-impedance DC feedback path in capacitive-feedback amplifiers, and the baseline voltage biasing of current mirrors, can be replaced by switches that are dynamically activated during the transmit phase, so as to prevent adding extra noise during the receive phase.

Finally, an effective on-chip power and reference distribution is crucial for ensuring the proper functioning of ultrasound front-end ASICs, because of both the large chip scale and the mixed-signal environment. However, in conventional circuit implementations, this concern often works against the ambition for achieving superior power-efficiency, therefore innovative alternatives are called for. In this work, dual-rail local regulation based on simple capacitor-less regulators is implemented to provide sufficient power line noise rejection for the single-ended inverter-based LNA array. The regulators are shared by 9 channels in each subarray to save both power and area. To provide uniform references for the large-scale ADC array, an efficient charge-reference generator based on self-calibrated current sources are proposed to get rid of bulky on-chip bypass capacitors and power-hungry voltage buffers.

The techniques described in this thesis have been applied in several prototype realizations, including one LNA test chip, one PVDF readout IC, two analog beamforming ASICs and one ASIC with on-chip digitization and datalinks. All prototypes have been evaluated both electrically and acoustically. The LNA test chip achieved a noise-efficiency factor (NEF) that is $2.5 \times$ better than the state-of-the-art. One of the analog beamforming ASIC achieved a 0.27 mW/element power efficiency with a compact layout matched to a 150 µm element pitch. This is the highest power-efficiency and smallest pitch to date, in comparison with state-of-the-art ultrasound front-end ASICs. The ASIC with integrated beamforming ADC consumed only 0.91 mW/element within the same element area. A comparison with previous digitization solutions for 3-D ultrasound shows that this work achieved a $10 \times$ improvement in power-efficiency, as well as a $3.3 \times$ improvement in integration density.
 Dit proefschrift beschrijft de analyse, het ontwerp en de evaluatie van toepassingsspecifieke geïntegreerde schakelingen (ASIC’s) voor 3D medische echografie, waarbij de nadruk ligt op de ontvangelektronica in de tip van de ultrageluid sonde. De ASIC’s zijn speciaal ontworpen voor de volgende generatie miniatuur 3D ultrasone sondes die kunnen worden toegepast voor transoesophageale echocardiografie (TEE), intracardiale echocardiografie (ICE) en intravasculaire echografie (IVUS). Deze sondes zijn uitgerust met 2D array-transducers en daardoor met het vermogen om volumetrische visualisaties te maken en zijn cruciaal voor zowel een goede diagnose als de therapiebegeleiding van cardiovasculaire ziekten. De beperkte omvang en het beperkte vermogensbudget bemoeilijken het integreren van elektronica in de sonde. Het toenemende aantal transducer-elementen en het beperkte aantal kabels dat kan worden gebruikt maken het een uitdaging om alle data vanuit deze sondes naar het beeldverwerkingsysteem te sturen. Om deze problemen aan te pakken worden ASIC’s voorgesteld die zijn geoptimaliseerd in zowel de systeemarchitectuur als de implementatie op circuitniveau.

Vanwege het kleine formaat van miniatuur 3D ultrasone sondes moet de 2D array-transducer direct bovenop de ASIC worden gemonteerd, waarbij vanwege de hoge dichtheid een speciaal verbindingschema gebruikt moet worden om de elektrische verbindingen tussen de elementen en de chip mogelijk te maken. Het direct integreren van een 2D PZT-matrix op een CMOS-chip wordt echter bemoeilijkt door de fysische eigenschappen van de gebruikte materialen. Het assembleren van PZT-transducers vereist een relatief lage temperatuur om depolarisatie van het piëzomateriaal te voorkomen. Bovendien is een niet-geleidende mechanische bufferlaag vereist tussen de PZT-matrix en de ASIC om de ASIC te beschermen tegen het zaagproces. In dit werk is er een PZT-op-CMOS integratiemethode ontwikkeld om deze uitdagingen aan te pakken. De methode maakt gebruik van een samengestelde verbindingslaag, die bestaat uit
een laag niet-geleidende epoxy met metalen kanalen en een laag geleidende lijm. De transducer-op-chip integratiemethode helpt niet alleen de ultrasone sondes kleiner te maken, maar draagt ook bij aan het minimaliseren van de parasitaire capaciteit die aan elk transducerelement wordt toegevoegd, waardoor de gevoeligheid voor ultrasone signalen wordt verhoogd.

Het dynamisch bereik van het ultrasone signaal wordt gewoonlijk bepaald door de ruisarme versterker (LNA) in de ontvangstschakeling, die direct verbonden is met de transducerelementen. Om een ruisniveau te krijgen dat laag genoeg is in vergelijking tot het ruisniveau van de transducer, wordt het grootste deel van het vermogen in de ontvangstschakeling gewoonlijk door de LNA verbruikt. Om de ruis-vermogens trade-off van de ontvangstschakeling te optimaliseren moet rekening worden gehouden met de karakteristiek van de gekozen transducer, waarbij de optimale keuze van de LNA-architectuur wordt bepaald door de elektrische impedantie en parasitaire capaciteit van het transducerelement. In dit proefschrift wordt als richtlijn voor ontwerpers een uitgebreide analyse van de optimale architectuurkeuze gegeven.

Naast het conditioneren van het signaal is het verminderen van het aantal kabels tussen de sonde en het beeldverwerkingssysteem een belangrijke taak van de ASIC. In dit werk wordt sub-array bundelvorming in ontvangst gebruikt om het aantal kabels met ongeveer een orde van grootte te verminderen. Het basisprincipe is om de tweedimensionale transducerarray te verdelen in sub-arrays van de juiste grootte, elk corresponderend met één sub-array bundelvormer, en om de bundelvormbewerking in twee stappen te verdelen. De subarray-bundelvormer past alleen fijne vertragingen toe voor de individuele elementen, terwijl de grove vertragingen worden toegepast in het externe beeldverwerkingssysteem om de implementatiecomplexiteit van de vertragingslijnen op de chip te verminderen. In dit werk wordt sub-array bundelvorming in het analoge domein gebruikt om een hogere vermogens-efficiëntie te behalen. In het digitale domein wordt een verdere kanaalreductie gerealiseerd met behulp van compacte sub-array bundelvormende ADC's, die de uitvoer van de analo-ge bundelvormer direct digitaliseren en de bitstream naar de dataverbinding op de ASIC-periferie transporteren. Daar worden de uitgangen van telkens vier bundelvormende ADC's geserialiseerd tot data met GHz-snelheid
en geëxporteerd naar FPGA’s buiten de chip, waardoor een extra 4-voudige kanaalreductie wordt behaald.

De analoge bundelvormers worden geïmplementeerd met pipeline-aangedreven geschakelde condensator-vertragingsslijnen vanwege hun uitstekende vermogens-efficiëntie, flexibiliteit en goede immuniteit voor variaties in proces/spanning en temperatuur (PVT). Door te sommeren in het ladingsdomein in plaats van in het spanningsdomein, waarbij de uitgang van de vertragingsslijnen passief wordt gesommeerd, kan een somversterker worden weggelaten. Er moet daarbij zorgvuldig rekening worden gehouden met de parasitaire capaciteit op het sommeringsknooppunt, omdat die ongewenst een laagdoorlaateffect veroorzaakt. Dit filtereffect kan worden voorkomen door een periodieke ladingsreset toe te passen, ten koste van een verhoogde complexiteit. Op basis van deze architectuur is in het ladingsdomein ook een vermogens- en oppervlakte-efficiënte sub-array bundelvormende ADC gerealiseerd, met behulp van een ladings-delende stapsgewijs benaderende (SAR) ADC. Tijdens kwantisatie wordt de passief gesommeerde signaallading sequentieel opgeheven door binair-geschaalde ladingsreferenties, geïmplementeerd als DAC-arrays met voorgeladen condensators.

De geschakelde-condensator vertragingsslijnen hebben echter last van ladingsinjectie en klokdoorverfouten, waarvan de mismatch zorgt voor een rimpelpatroon met een in-band frequentie op de uitgang van de bundelvormer. Dit beperkt het dynamische bereik van een analoge bundelvormer. In dit werk worden twee verschillende benaderingen voorgesteld om deze interferentie te verminderen. De eerste benadering, beschreven als mismatch-scrambling, maakt de werkvolgorde van de vertragingsfases willekeurig door een extra geheugencel en de bijbehorende besturingslogica toe te voegen. Door dit te doen, worden de storende tonen in het spectrum omgezet in breedbandruis, ten koste van een enigszins verhoogde ruisvloer. Als alternatief kunnen de rimpelpatronen vooraf worden vastgelegd en opgeslagen in digitaal geheugen en vervolgens worden afgetrokken van de uitvoer van de bundelvormer tijdens de normale ontvangstbewerking. Deze back-endkalibratie is gebaseerd op de integratie van de sub-array bundelvormende ADC, die de rimpelpatronen synchroon met de bundelvormerklok digitaliseert. De laatst genoemde aanpak ruilt complexiteit in de digitale verwerking in voor een betere ruisprestatie in het analoge ingangscircuit.
Minimalisatie van het benodigde vermogen is de belangrijkste uitdaging bij het ontwerpen van ultrageluid ASIC’s bedoeld voor integratie in een sonde, waarbij het basisprincipe van een laag-vermogen analoog ontwerp is om de efficiëntie van het stroomgebruik te verhogen. Vanuit dit principe wordt een stroom-hergebruikende transconductor op basis van een CMOS-inverter gebruikt om de huidige efficiëntie van de LNA te verbeteren. Het feit dat de LNA ongebalanceerd is maakt het echter een uitdaging om de inverter op de juiste manier in te stellen. Om dit probleem aan te pakken worden verschillende innovatieve circuittechnieken voorgesteld, waaronder een terugkoppelpnetwerk met gesplitste condensator en een instelpuntregeling met dynamische terugkoppeling. Verder wordt voor de aansturing van de analoge kabels een klasse-AB super-sourcevolger gebruikt om nog meer vermogen te besparen.

Een ultrageluid ASIC werkt synchroon met de zend/ontvangstcycli van het ultrasonesysteem. Dat wil zeggen, de ontvangstschakelingen kunnen tijdens de zendfase periodiek worden geschakeld naar een inactieve modus. Een dergelijke functie wordt in dit werk ten volle benut om de circuitprestaties verder te verbeteren. Zo kan bijvoorbeeld de statische instelpuntregeling in het analoge ingangscircuit, zoals het DC-terugkoppelpad met hoge impedantie in capacitivefeedbackversterkers en de instelpuntspanning van de stroomspiegels, worden vervangen door schakelaars die dynamisch worden geactiveerd tijdens de zendfase, om te voorkomen dat er ruis wordt toegevoegd tijdens de ontvangstfase.

Ten slotte is een effectieve vermogens- en referentiedistributie op de chip cruciaal voor het verwezenlijken van goed werkende ultrageluid ASIC’s, vanwege zowel de grote afmetingen van de chip als de gemengde signaalomgeving. Bij conventionele circuitimplementaties werkt bezorgdheid echter vaak tegen het streven naar het bereiken van een betere energie-efficiëntie, daarom zijn innovatieve alternatieven vereist. In dit werk worden lokale spanningsregelaars met dubbele rails op basis van eenvoudige condensatorloze regelaars geïmplementeerd om voldoende immuniteit tegen voedingsrimpel te bieden aan de ongebalancede op-inverter-gebaseerde LNA-array. Elke regelaar levert de stroom van de 9 kanalen van een sub-array om vermogen en oppervlakte te besparen. Om grote ontkoppelcondensatoren en spanningsbuffers die veel energie verbruiken weg te kunnen laten is een efficiënte ladingsreferentie-
generator gebaseerd op zelfgekalibreerde stroombronnen voorgesteld om uniforme referenties te maken voor de grootschalige ADC-array.

De technieken beschreven in dit proefschrift zijn gerealiseerd in verschillende prototypes, waaronder één LNA test chip, één chip voor het uitlezen van PVDF transducers, twee analoge bundelvormende ASIC’s en één ASIC waarin de digitalisatie en dataverbinding zijn geïntegreerd. Alle prototypes zijn zowel elektrisch als akoestisch geëvalueerd. De LNA test chip behaalde een ruis-efficiëntiefactor (NEF) die 2,5 keer beter is dan de huidige stand van de techniek. Eén van de analoge ASIC’s behaalde een vermognessefficiëntie van 0,27 mW per element met een compact ontwerp dat met een elementafstand van 150 µm overeenkomt. Dit is op het moment van schrijven de hoogste vermognessefficiëntie en kleinste elementafstand, vergeleken met de nieuwste ultrageluid front-end ASIC’s. De ASIC met geïntegreerde bundelvormende ADC verbruikte slechts 0,91 mW per element binnen dezelfde elementoppervlakte. Een vergelijking met eerdere digitaliseringsoplossingen voor 3D-echografie toont aan dat dit werk een 10-voudige verbetering van de energie-efficiëntie heeft bereikt, evenals een 3,3-voudige verbetering van de integratiedichtheid.
Samenvatting
**List of Abbreviations**

2-D  Two-dimensional
3-D  Three-dimensional
A/D  Analog-to-digital
ADC  Analog-to-digital converter
AFE  Analog front-end
ASIC  Application-specific integrated circuits
AWG  Arbitrary waveform generator
BER  Bit-error-rate
CDR  Clock-data-recovery
CFA  Capacitive-feedback voltage amplifiers
CMOS  Complementary metal-oxide-semiconductor
CMUT  Capacitive micromachined ultrasound transducers
CW  Continuous wave
DAC  Digital-to-analog converter
DAS  Delay-and-sum
DFF  D-flip-flop
DLL  Delay-locked-loop
DR  Dynamic range
ESD  Electrostatic discharge
FDM  Frequency-division multiplexing
FIFO  First-in, first-out
FPGA  Field-programmable gate array
HV  High-voltage
IC  Integrated circuits
ICE  Intracardiac echocardiography
IVPA  Intravascular photoacoustic
IVUS  Intravascular ultrasound
LDO  Low-dropout regulator
LFSR  Linear-feedback shift register
<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>LNA</td>
<td>Low-noise amplifier</td>
</tr>
<tr>
<td>LSB</td>
<td>Least-significant bit</td>
</tr>
<tr>
<td>LVDS</td>
<td>Low-voltage differential signaling</td>
</tr>
<tr>
<td>MIM</td>
<td>Metal-insulator-metal (capacitor)</td>
</tr>
<tr>
<td>MOM</td>
<td>Metal-oxide-metal (capacitor)</td>
</tr>
<tr>
<td>MSB</td>
<td>Most-significant bit</td>
</tr>
<tr>
<td>NEF</td>
<td>Noise-efficiency factor</td>
</tr>
<tr>
<td>NRZ</td>
<td>Non-return-to-zero</td>
</tr>
<tr>
<td>OTA</td>
<td>Operational transconductance amplifier</td>
</tr>
<tr>
<td>PA</td>
<td>Photoacoustic</td>
</tr>
<tr>
<td>PCB</td>
<td>Printed circuit board</td>
</tr>
<tr>
<td>PGA</td>
<td>Programmable-gain amplifier</td>
</tr>
<tr>
<td>PMUT</td>
<td>Piezoelectric micromachined ultrasound transducers</td>
</tr>
<tr>
<td>PRBS</td>
<td>Pseudo-random bit sequence</td>
</tr>
<tr>
<td>PRF</td>
<td>Power-supply-rejection-ratio</td>
</tr>
<tr>
<td>PRNG</td>
<td>Pseudo-random number generator</td>
</tr>
<tr>
<td>PSF</td>
<td>Point-spread-function</td>
</tr>
<tr>
<td>PSRR</td>
<td>Photoacoustic</td>
</tr>
<tr>
<td>PVDF</td>
<td>Polyvinylidene fluoride</td>
</tr>
<tr>
<td>PVT</td>
<td>Process/voltage/temperature</td>
</tr>
<tr>
<td>PZT</td>
<td>Lead zirconium titanate</td>
</tr>
<tr>
<td>RRL</td>
<td>Ripple-reduction-loop</td>
</tr>
<tr>
<td>RX</td>
<td>Receiver</td>
</tr>
<tr>
<td>RZ</td>
<td>return-to-zero</td>
</tr>
<tr>
<td>S/H</td>
<td>Sample-and-hold</td>
</tr>
<tr>
<td>SAR</td>
<td>Successive-approximation-register</td>
</tr>
<tr>
<td>SC</td>
<td>Switched-capacitor</td>
</tr>
<tr>
<td>SNR</td>
<td>Signal-to-noise ratio</td>
</tr>
<tr>
<td>SNDR</td>
<td>Signal-to-noise-and-distortion ratio</td>
</tr>
<tr>
<td>SPI</td>
<td>Serial peripheral interface bus</td>
</tr>
<tr>
<td>TEE</td>
<td>Transesophageal echocardiography</td>
</tr>
<tr>
<td>TGC</td>
<td>Time-gain compensation</td>
</tr>
<tr>
<td>TIA</td>
<td>Trans-impedance amplifiers</td>
</tr>
<tr>
<td>TSV</td>
<td>Through-silicon-via</td>
</tr>
<tr>
<td>TTE</td>
<td>Transthoracic echocardiography</td>
</tr>
<tr>
<td>Abbreviation</td>
<td>Description</td>
</tr>
<tr>
<td>--------------</td>
<td>--------------------------------------</td>
</tr>
<tr>
<td>TX</td>
<td>Transmitter</td>
</tr>
<tr>
<td>UWB</td>
<td>Ultra-wide-band</td>
</tr>
<tr>
<td>VCDL</td>
<td>Voltage-controlled delay line</td>
</tr>
<tr>
<td>VCSEL</td>
<td>Vertical-cavity surface-emitting laser</td>
</tr>
</tbody>
</table>
LIST OF PUBLICATIONS

Journal Articles


**Conference Proceedings**


**Workshop Contributions**

The journey in the last five years has been the most cherished time in my life. I am deeply grateful to all the people who, in one way or another, have helped me during this exciting adventure. Without your great supports, it would be impossible for me to arrive at this stage.

First and foremost, I would like to express my most sincere thanks to my supervisor, Dr. Michiel Pertijs, for all the invaluable guidance and help that you have given me in the last 7 years. Yes, since the summer of 2011, we have been working together on a variety of projects; how time flies! The long Ph.D. journey has granted me sufficient time to learn the most from you, especially your serious working attitude and rigorous scholarship as being an enthusiastic analog circuit designer. You unlimited creativity and extremely deep understanding in analog and mixed-signal circuits have also impressed from time to time. What I treasure most, however, is the opportunity to work with you in exploring a new territory and laying the foundation stone of a new building. This journey is never easy, but full of joy and the sense of accomplishment. Back in July 2013, when we still had no idea how to establish a reliable interconnection between the chip and the transducer, we had a long discussion while attending the IUS conference in Prague. A dozen of intriguing ideas were generated at that time, and, surprisingly, many of them have become working silicon chips after 4 years! You have granted me the full freedom to try out any new ideas and techniques, but never left me alone, as you are always willing to set aside your own priority to listen to me and brainstorm together. Such experience has gifted me a strong confidence for my career, and will continue in motivating me to look for more challenges in the future. Thank you, Michiel.

Next, I would like to thank my promotor, Prof. Nico de Jong. Nico, thanks for guiding me into the ultrasound world and giving me the opportunity to explore this fantastic field. To me you are not only a boss but also a good friend, and probably the best one that I have ever had the pleasure to work for (with). Your kindness and generosity have made me feel very comfortable from the start, and you have granted me the greatest freedom and flexibility to demonstrate those
out-of-box ideas in electronics design. Besides, you have also shown me what it takes to be the leader of a successful research group.

This multidisciplinary project would not be able to make the current achievements without the great guidance of supervisors from the Laboratory of Acoustical Wavefield Imaging (AWI). I would like to thank these respectful experts in the field of acoustic imaging and biomedical engineering: Johan Bosch, Martin Verweij and Hendrik Vos. Hans, thank you for your unreserved supports and encouragement for so many years, and for your prompt and insightful feedback on all my manuscripts and presentation rehearsals. Martin, thank you for contributing so many interesting ideas. I learned my first knowledge of acoustic imaging from your lecture. Rik, I can’t remember how many times you showed up in the acoustic lab to address critical problems in our imaging experiments. You are always able to rapidly grasp the key points in our electronic architecture and contribute your own smart ideas. You have set an excellent model for us young researchers.

I would like to express my gratitude to all professors in the Electronic Instrumentation (EI) Laboratory: Prof. Kofi Makinwa, Prof. Paddy French, Prof. Albert Theuwissen, Prof. Johan Huijsing, Prof. Gerard Meijer, Prof. Dr. Andre Bosshe, Dr. Stoyan Nihtianov, Dr. Reinoud Wolffénbuttel and Dr. Fabio Sebastiano. I have learned a lot from all of you, especially your serious working attitude and rigorous scholarship, also known as the ‘Delft style’. Specially, I would like to thank Kofi and Albert. Kofi, I’m always impressed to your superior leadership, broad vision and profound understanding to analog integrated circuits. You have set a high standard for analog circuit designers. I treasure every insightful comment or feedback you have given me on my papers, presentations and work plans. Albert, you are probably the most respectful gentleman that I have ever met. It was always a pleasure for me to talk to you, either in the EWI restaurant or at the ISSCC social events. I have learned from you the attitude of being an exact scholar, as well as your enthusiasm for both technology and life.

I would like to thank all the committee members who have approved my thesis: Prof. Sandy Cochran, Dr. Pieter Harpe, Prof. Robert Puers, Prof. Ronald Dekker and Dr. Zili Yu. Your careful review and insightful comments have significantly improved the quality of this thesis. Special thanks go to Zili. We have a half-year
overlap in 2013. Afterwards you have been always pay attention to my project progress, and give your precious advices frequently. You have also taught me how to achieve a better work/life balance, which has become an asset in my current career. In addition, thanks for your warm hospitality during our trip to Germany in 2015!

Next, I would like to give a big hug to my Ph.D. colleagues in the MICA project, Zhao and Deep. Zhao, my comrade-in-arms, I cannot find the words to express my gratitude to you. I would never forget the 72-hour sleepless hard work in summer 2016, when we were fighting towards the final tape-out of the MICA-3 ASIC. Since our first meeting at the Schiphol airport in the cold January of 2014, we have been working together like that for 4 years, and no doubt you are the best partner in work that I could ever have in my life. We always think in the same direction, share the similar opinion, and work towards the common target. We can always work out some weird but efficient solutions to address all kinds of technical or non-technical problems that popped up to us every day. Besides the incredible enthusiasm for work, you are also equipped with a (British?) gentleman spirit, always being considerate of others and making people around you feeling comfortable.

Deep, my Indian bro, I had so much fun working with you. Since our first talk in the corridor of the AWI lab in 2013, I have been deeply impressed by your solid knowledge in ultrasound imaging and career experience in different countries, as well as your strong sense of humor (yes!). I miss those sunny or rainy afternoons when we were working together besides the water tank in the AWI lab. You were always willing to answer my silly questions on the ABCs of image reconstruction, while always keeping patient (and keeping me relaxed) when I was fighting with those unexpected bugs in the measurement setup. Man, wherever you will be in your future career, I will miss you.

During my Ph.D. journey, I have spent about 25% of my time in the AWI lab, where I discussed questions and conducted acoustic experiments with my superior colleagues in the AWI lab. My great thanks go to Shreyas Raghunathan, Maysam Shabanimotlagh, Emile Noothout, and Verya Daeichin. Without your exceptional efforts, this project would only end up at the level of schematic design. In particular, I would like to thank Emile, who works on the most critical part of this project, i.e. transducer-on-CMOS integration. It is your amazing skills and incredibly hard work that allow us to demonstrate so many ASIC prototypes with
integrated transducers within such a short time frame. There has been too much pressure on you coming from all sorts of deadlines for paper submissions, but you never disappointed us. I also admire your strong sense of responsibility and can-do attitude, which will inspire me from time to time in my future career.

I would like to take this opportunity to thank all technicians in both the EI and the AWI lab. Without their technical supports, it would be impossible for our project to reach so far.

I would like to express my special appreciation for Zu-yao, our ‘superman’ in the EI lab. I’m deeply grateful for all the substantial help that you have given me in the past 7 years. I cannot even count how many times you have helped me to optimize the PCB layout, or how many hours you have spent in resolving the bonding issues of our ASICs. What I admire most, however, is your inner peace, which allows you to keep an incredible patience even if you are confronted with dozens of requests, and which I always tried to learn from you and equip myself with. It really pains me to know that I might not be able to have a colleague and a teacher like you again in my future career.

Lukasz, I enjoy every moment talking to and working with you. You are always cheerful, enthusiastic and warm-hearted, and are always willing to give me a hand even if you are just about to rush home. Ron, thanks for creating those wonderful images showing the 3-D assembly of our miniature probes. Without them our work would be much less recognized by the public. My sincere thanks also go to Jeroen, Henry, Edo and Robert, who have made indispensable contributions to the construction of our imaging experiment setups.

I would like to thank all other members in Pertijs’ group: Zeyu Cai, Douwe van Willigen, Mingliang Tan and Eunchul Kang. It might be difficult for me to work with such a group of interesting and nice people like you guys in my future career.

Zeyu, we got to know each other since the autumn of 2012, and after that we have been not only lab colleagues, but also best friends in life. Thanks for always bringing happiness and encouragement to me with your cheerful attitude towards life, and I often recall the fantastic time we were traveling together in Graz and San Francisco.
Mingliang, although you appeared as a shy and nervous boy in your first interview, it never occurred to me to doubt that you would become an outstanding circuit designer. You have already proved yourself after two years of hard work, and you will continue in bringing us more surprise in your coming Ph.D. journey.

Douwe, you always surprise me with your amazing creativity in both engineering (I love your PCBs!) and art (photos and paintings). I appreciate your efforts in constructing the wiki website for our group; it really helps a lot in keeping our research well-organized. Moreover, thanks a ton for your great help in translating the summary of this thesis!

Eunchul, I always learned from reading your project reports and lab journals, which may be the most tidy and well-organized ones that I have ever read. Also, thanks for your nice group-meeting presentations that introduced me to the world of power electronics!

My project has got substantial technical supports from Oldelft Ultrasound B.V., especially in transducer fabrication and measurements. I would like to thank the following individuals, with whom I have been closely collaborated for years: Boris Lippe, Jacco Ponte, Christian Prins, Sandra Blaak and Franc van den Adel. Special thanks go to Boris. I always enjoyed those technical discussions with you, ranging from top-level system approaches to tiny details associated with a single transistor. These discussions, no matter in which kind of topics, are always fulfilled with wild ideas and innovative solutions, and have helped me a lot in broadening my horizons.

I would like to express my gratitude to Joyce, Karen and Xander, our lovely secretaries of the EI lab. Thanks for your kind and prompt supports and care in the past years. Special thanks go to Joyce. Your kindness and enthusiasm always make me feel very welcome and relaxed. I will miss you in the future wherever I am in the world.

I would also like to express my gratitude to all M.Sc. students that I have supervised or co-supervised in these years: Michele D’Urbino, Hao Fan, Weichen Xu, Weihan Hu, Revanth Bellamkonda, Qilong Liu, Peng Chen, Giorgos Karykis and Marco de Stefano. It has been a great pleasure having the opportunities to work with all of you, to share your joy and sorrow and to witness your growth. I have also learned so much from you, and always got inspired by your enthusiasm.
for both work and life. Special thanks go to Michele: you have done a good job in keeping my brain active throughout the whole year of 2016 :-)

I would like to thank all my lovely office roommates in HB.15.100: Mohammadimir Ghaderi (Amir), Sining Pan, Johan Vogel, Liqiang Han and Manuel. In particular, my thanks go to Amir. It is my great fortune to have you sitting next to me for 5 years, and I find you not only a true friend, but also a respectful researcher and a good teacher. I appreciate your valuable advices on cleanroom post-processing at the beginning of my Ph.D. study, as well as your guidance in the use of physical modeling tools like COMSOL and TCAD. And by the way, thank you for keeping watering the flowers and making our office always bloomy!

I would like to thank all my colleagues in the big EI family: Guijie Wang, Yu Xin, Long Xu, Hui Jiang, Qinwen Fan, Jing Li, Qing Ding, Pelin Ayerden, Saleh Heidary Shalmany, Bahman Yousefzadeh, Lorenzo Pedala, Shuang Xie, Fei Wang, Burak Gönen, Junfeng Jiang, Ugur Sonmez, Sander Flipse, Nikola Radelijc-Jakic, Qi Gan and Vikram Chaturvedi. Thanks for all the good time we have spent together on day-outs, dinners, conferences, courses or other activities. Special thanks for Ms. Wang and her family for their warm-hearted support and care in the past years. You always let me feel at home in the Netherlands.

I would also like to thank all my senior colleagues in the EI lab: Zhichao Tan, Ruimin Yang, Yang Xu, Jiawei Xu, Sha Xia, Jiamin Tan and Ning Xie. Thanks for all your valuable help and guidance in the early years of my Ph.D. study.

I would like to thank my friends here. The thank goes to Chao Zhang, Dong Liu, Yuhui Peng, Fengli Wang, Bindi Wang, Yuxin Yan, Yuanxin Xu, Xinqian Fan, Zhuolin Liao, Yan Jin, Mengting Jiang and Yana Li. Thank you for all the precious help you have given me and all good time we have enjoyed together in parties, sports and travelling.

Of course, I would like to thank Xiaoliang, my special one. Thank your for sharing with me the ups and downs every day, and for the deepest love and understanding you give me. Your persistence of dreams is also inspiring me to move forward in my life.
Finally, I would like to give my deepest gratitude to my family. Thanks Papa and Mama, for all that you have done for me in the past 30 years. Only with your support, I could take the courage to travel and study across the globe. Thank you for your continuous love, care and encouragements throughout my life. For my grandparents, you are always living in my heart.

Chao Chen
March 23, 2018
in Antwerp, Belgium
Acknowledgements
Chao Chen was born in Longhai, Fujian Province of China in December 1987. He received his B.Sc. degree in 2010 from Tsinghua University, Beijing, China, and his M.Sc. degree (cum laude) in 2012 from Delft University of Technology, Delft, The Netherlands, both in microelectronics. For his M.Sc. thesis, he worked on energy-efficient self-timed incremental ΔΣ ADCs.

Since December 2012, he has been working toward his Ph.D. degree at the Electronic Instrumentation Laboratory, Delft University of Technology, where he works on ASIC design for medical ultrasound imaging. Since November 2017, he has been a senior analog IC designer with Butterfly Network Inc., Guilford, CT, USA, where he designs ASICs for next-generation medical ultrasound systems. His research interest includes integrated circuits for medical ultrasound imaging and data converters.

Mr. Chen was the recipient of the TU Delft Toptalent Scholarship in 2010 and the Dutch government Huygens Scholarship in 2011. During his Ph.D. study, he received several academic honors including the ISSCC 2013 and IUS 2015 STGA award, the IUS 2017 Student Paper Competition Winner award and the ASSCC 2017 Best Student Paper award (co-authored).

In his spare time, he likes travelling, reading history and playing soccer.