A Front-End ASIC with Receive Sub-array Beamforming Integrated with a 32 × 32 PZT Matrix Transducer for 3-D Transesophageal Echocardiography

Chen, Chao; Chen, Zhao; Bera, Deep; Raghunathan, Shreyas; ShabaniMotlagh, Maysam; Noothout, Emile; Chang, Zu Yao; Ponte, Jacco; Prins, Christian; Vos, Rik

DOI
10.1109/JSSC.2016.2638433

Publication date
2017

Document Version
Accepted author manuscript

Published in
IEEE Journal of Solid State Circuits

Citation (APA)

Important note
To cite this publication, please use the final published version (if applicable). Please check the document version above.
A Front-end ASIC with Receive Sub-Array Beamforming Integrated with a 32 × 32 PZT Matrix Transducer for 3-D Transesophageal Echocardiography

Chao Chen¹, Zhao Chen¹, Deep Bera³, Shreyas B. Raghunathan², Maysam Shabanimotlagh², Emile Noothout², Zu-yao Chang¹, Jacco Ponte⁴, Christian Prins⁴, Hendrik J. Vos²,³, Johan G. Bosch³, Martin D. Verweij²,³, Nico de Jong²,³, Michiel A.P. Pertijs¹

¹Electronic Instrumentation Lab., Delft University of Technology, Delft, The Netherlands
²Lab. of Acoustical Wavefield Imaging, Delft University of Technology, The Netherlands
³Dept. of Biomedical Engineering, Thoraxcenter, Erasmus MC, Rotterdam, The Netherlands
⁴Oldelft Ultrasound, Delft, The Netherlands

Abstract

This paper presents a power- and area-efficient front-end ASIC that is directly integrated with an array of 32 × 32 piezoelectric transducer elements to enable next-generation miniature ultrasound probes for real-time 3-D transesophageal echocardiography. The 6.1 × 6.1 mm² ASIC, implemented in a low-voltage 0.18 μm CMOS process, effectively reduces the number of cables required in the probe’s narrow shaft by means of 96 delay-and-sum beamformers, each of which locally combines the signals received by a sub-array of 3 × 3 elements. These beamformers are based on pipeline-operated analog sample-and-hold stages, and employ a mismatch-scrambling technique to prevent the ripple signal associated with mismatch between these stages from limiting the dynamic range. In addition, an ultra-low-power LNA architecture is proposed to increase the power-efficiency of the receive circuitry. The ASIC has a compact, element-matched layout, and consumes less than 230 mW while receiving. Its functionality has been successfully demonstrated in 3-D imaging experiments.

This is an Accepted Manuscript of an article published by IEEE in:
available online: https://doi.org/10.1109/TGRS.2017.2696263
**I. INTRODUCTION**

Volumetric visualization of the human heart is essential for the accurate diagnosis of cardiovascular diseases and the guidance of interventional cardiac procedures. Echocardiography, which images the heart using ultrasound, has become an indispensable modality in cardiology because it is safe, relatively inexpensive and capable of providing real-time images. Transesophageal echocardiography (TEE), as its name indicates, generates ultrasonic images from the esophagus, by utilizing an ultrasound transducer array mounted at the tip of a gastroscopic tube (Fig. 1). Conventionally, the elements of the transducer array are connected using micro-coaxial cables to an external imaging system, where properly-timed high-voltage pulses are generated to transmit an acoustic pulse, and the resulting echoes are recorded and processed to form an image.

2-D TEE probes are widely used in clinical practice. They employ a 1-D phased-array transducer to obtain cross-sectional images of the heart. However, such 2-D images often fall short in providing comprehensive visual information for complex cardiac interventions, such as minimally-invasive valve replacements and septal-defect closures. Appropriate real-time 3-D imaging would be very beneficial for improving the success rate of such procedures [1].

The relatively large probe heads (typically > 10 cm³) of current 3-D TEE probes cannot be tolerated by the patient during longer procedures (unless general anesthesia is applied) and are too large for pediatric use. For longer-term monitoring and pediatric use, the volume of the probe tip should be constrained to an upper limit of 1 cm³, and the tube diameter to 5 ~ 7 mm [2]. To enable real-time 3-D imaging, a 2-D phase array is required. For an $N \times N$ array, the achievable signal-to-noise ratio (SNR) and the lateral resolution both scale linearly with $N$. Therefore, it is desired to make the full use of the available array aperture within the probe tip.
(5 × 5 mm²). In addition, the pitch of transducer elements should be kept within half of the acoustic wavelength (λ) to minimize grating lobes and to ensure proper spatial imaging resolution [14]. For a 2-D array with a center frequency of 5 MHz, this corresponds to a pitch of 150 μm, leading to at least 32 × 32 elements. Accommodating the corresponding number of micro-coaxial cables within the narrow gastroscopic tube is difficult or even impossible. Decreasing the aperture size to reduce the number of channels will lead to a significant deterioration in both the SNR and the lateral resolution. As a result, channel reduction should be performed locally to reduce the number of cables with the aid of miniaturized in-probe electronics [3].

A variety of approaches have been proposed to reduce the cable count in endoscopic and catheter-based ultrasound systems. Part of the beamforming function, which is conventionally performed in the external imaging system to achieve spatial directivity and enhance the signal-to-noise ratio, can be moved into the probe [4, 5]. Time-division multiplexing approaches have been applied in [6, 7] to allow multiple elements to share a single cable. Solutions based on element-switching schemes [8, 9] have also been reported. All these approaches rely on the realization of a front-end ASIC that is closely integrated with the transducer array.

Design of such front-end ASICs is challenging in several aspects. First, the power consumption of the ASIC, which contributes to the overall self-heating of the probe, should be kept below an estimated 0.5 W [10], to avoid excessive tissue temperature rise [11]. This translates to 0.5 mW/element for a 1000-element array and is beyond the state-of-the-art of front-end ultrasound ASICs, which consume at least 1.4 mW/element [9, 12, 13]. Another challenge comes from the dense interconnection between the ASIC and the transducer array. Direct transducer-on-chip integration is desired, as it not only helps to get a small form factor, but also reduces the parasitic interconnect capacitance added to each transducer element. This
calls for an element-matched ASIC layout, with a pitch identical to that of transducer elements. As a result, a highly compact circuit implementation for the ASIC is called for. Prior works [12, 15] compromised somewhat on the imaging quality by opting for a slightly larger pitch. Indirect transducer-to-chip integration via interposer PCBs [5, 9] allows the use of a different pitch for the transducer array and the ASIC. However, the limited space within the TEE probe tip precludes this option.

In this paper, we present a front-end ASIC that is optimized in both system architecture and circuit-level implementation to meet the stringent requirements of 3-D TEE probes [16]. It is directly integrated with an array of 32 × 32 piezoelectric transducer elements, which are split into a transmit and a receive array to facilitate the power and area optimization of the ASIC [17]. The receive elements are further divided into 96 sub-arrays, each with a switched-capacitor-based beamformer, to realize a 9-fold cable reduction. Besides, an ultra-low-power LNA architecture [18], which incorporates an inverter-based OTA with a bias scheme tailored for ultrasound imaging, is proposed to increase the power-efficiency of the receive circuitry, while keeping the area compact. In addition to that, a mismatch-scrambling technique is applied to mitigate the effects of mismatch between the beamformer stages, and thus improve the overall dynamic range of the ASIC while receiving. These circuit techniques, while designed for PZT matrix transducers, are also relevant for other types of ultrasound transducers, such as capacitive micromachined ultrasonic transducers (CMUTs). The functionality of the ASIC as well as the effectiveness of the proposed techniques have been successfully demonstrated by imaging experiments.

The paper is organized as follows. Section II describes the proposed system architecture. Section III discusses the details of the circuit implementation. Experimental results are presented in Section IV. Conclusions are given at the end of the paper.

II. SYSTEM ARCHITECTURE
A. Transducer Matrix Configuration

In conventional ultrasound probes, each transducer element is used both as transmitter and receiver. A high-voltage CMOS process is then needed to generate the transmit pulses of typically tens of Volts [13]. The integration density of high-voltage processes is generally lower than that of their low-voltage counterparts with the same feature size, which is disadvantageous for ASICs that directly interconnect with 2-D transducer arrays with a tiny element pitch.

In this work, we use an array of $32 \times 32$ PZT elements with separate transmit and receive elements (Fig. 2). An $8 \times 8$ central sub-array is directly wired out to transmit channels in the external imaging system using metal traces in the ASIC that run underneath 96 un-connected elements to bond-pads on the chip’s periphery. All other 864 elements are connected directly to on-chip receiver circuits, whose outputs are fed to the imaging system’s receive channels.

The use of a small central transmit array helps in reducing the overall cable count as well as obtaining a large opening angle while receiving. With respect to the conventional array configuration where each transducer element is used for both transmit and receive, our scheme trades the lateral resolution for a higher frame rate. In our scanning procedure, the transmitter is used to generate only a few wide beams, illuminating an area that can accommodate a number of parallel receive beams per transmit pulse, thus yielding a high frame rate. On the other hand, it should be also ensured that the generated acoustic pressure is adequate for the target imaging depth. According to our numerical simulations in PZFlex (Weidlinger Associates Inc., Mountain View, CA, USA), 64 elements should be capable to generate sufficient pressure for an imaging depth up to 10 cm. Moreover, despite the missing elements in the receiver aperture, the point spread function (PSF) is comparable with a fully-populated receiver, as shown by simulations in [19]. This configuration allows the use of a dense low-voltage CMOS technology, thus saving power and circuit area. Compared to [12],
which uses the majority of elements to transmit and a sparse array to receive, it achieves better receiving sensitivity as well as lower side-lobes. Moreover, it also helps to reduce the overall in-probe heat dissipation, as transmit circuits normally consume more power [9].

The transducer array was constructed by dicing a bulk piezo-electrical material (CTS 3203 HD) into a matrix. It is directly mounted on top of the front-end ASIC using the PZT-on-CMOS integration scheme described in [10]. The PZT matrix measures 4.8 mm × 4.8 mm with an element pitch of 150 μm and a dicing kerf width of 20 μm. It was designed for a center frequency of 5 MHz and a 50% bandwidth (3.75 MHz ~ 6.25 MHz).

B. Sub-array Beamforming in Receive

The cable-count reduction approach that we adopted in this work is to perform partial receive beamforming in the ASIC. The basic principle of ultrasound beamforming is to apply appropriate relative delays to the received signals in such a way that ultrasound waves coming from the focal point arrive simultaneously and can be constructively combined. Full-array beamforming for 32 × 32 transducer elements is impractical for circuit implementation due to the large delay depth required for each element. The sub-array beamforming scheme [4], also known as “micro-beamforming” [17], mitigates this issue by dividing the beamforming task into two steps. A coarse delay that is common for all elements within one sub-array is applied in the external imaging system, while only fine delays for the individual elements is applied by sub-array beamformers in the ASIC.

The sub-array size is determined based on the following concerns. First, in order to keep the symmetry of the beamforming in lateral and elevation directions, a square sub-array is desired. Besides, a larger sub-array brings a more aggressive cable-count reduction, but comes at the cost of an elevated grating-lobe level and a greater delay depth required per element. We selected a 3 × 3 configuration to achieve a reasonable acoustic imaging quality, while reducing the number of cables by a factor of 9 [20]. Accordingly, the 864 receive elements of
the transducer matrix are divided into 96 sub-arrays and interfaced with 96 sub-array receiver circuits in the ASIC.

The fine delays are programmable in steps of 30 ns up to 210 ns, allowing the sub-array’s directivity to be steered over angles of $0^\circ$, $\pm 17^\circ$, and $\pm 37^\circ$ in both azimuthal and elevation directions [10]. All sub-arrays can be programmed identically, which is appropriate for far-field beamforming and requires loading of only 9 delay settings into the ASIC, which has a negligible impact on the frame rate. The ASIC is also equipped with a mode in which all sub-arrays can be programmed individually (i.e. $96 \times 9$ settings), allowing near-field focusing at the expense of a longer loading time, and hence a slower frame rate.

### III. Circuit Implementation

Fig. 3 shows the schematic of a $3 \times 3$ sub-array receiver. It consists of 9 LNAs, 9 buffers, 9 analog delay lines, a programmable-gain amplifier (PGA) and a cable driver. The LNA output is AC-coupled to a flipped source follower buffer that drives the analog delay line. The joint output of all 9 analog delay lines is then amplified by the PGA. A cable driver buffers the output signal of the PGA to drive the micro-coaxial cable connecting to the imaging system.

The echo signals received by the transducer elements have a dynamic range of about 80 dB, 40 dB of which is associated with the fact that echoes from deeper tissue are attenuated more along their propagation path. The gains of the LNA and the PGA are programmable to compensate for this attenuation. The LNA provides a voltage gain up to 24 dB, to attenuate the impact of noise of the subsequent stages at small signal levels. The gain can be reduced to -12 dB and 6 dB to avoid output saturation at high signal levels. The PGA provides an additional switchable gain with finer steps (0, 6, 12 dB) to interpolate between the gains steps of the LNA.
As described in Section I, all the above circuits, along with their biasing and digital control circuits, must be implemented within the area of a $3 \times 3$ sub-array, i.e. $450 \, \mu m \times 450 \, \mu m$, while consuming less than $4.5 \, mW$. Dedicated circuit techniques have been applied to meet these requirements, which will be discussed in this section.

A. LNA

The choice of the ultrasound LNA topology is dictated by the electrical impedance of the target transducer. Trans-impedance amplifiers (TIA) are widely used in readout ICs for CMUT transducers because of their relatively high impedance [21]. However, a similarly-sized PZT transducer has a much lower impedance around the resonance frequency, typically a couple of $k\Omega$s for our transducers (Fig. 4). In view of this, the TIA topology falls short in achieving an optimal noise/power trade-off, since creating a low enough input-impedance requires extra power spent on increasing the open-loop gain, rather than on suppressing the input-referred noise [18]. In this work, instead, we use a capacitive-feedback voltage amplifier, shown in Fig. 5, which offers a mid-band voltage gain of $A_M = C_I / C_F$. Its input impedance is dictated by the input capacitor $C_I$ and can be easily sized to tens of $k\Omega$s within the transducer bandwidth, so as to sense the transducer’s voltage rather than its current.

A current-reuse OTA based on a CMOS inverter is employed to enhance the power-efficiency of the LNA. In previous inverter-based designs [22], extra level-shifting capacitors ($C_{LS}$) are used to independently bias the NMOS and PMOS transistors, as shown in Fig. 6(a). These level-shifting capacitors and the associated parasitic capacitors at the virtual ground node form a capacitive divider, which attenuates the input signal and thus increases the input-referred noise of the LNA. Enlarging $C_{LS}$ helps in reducing this noise penalty, at the cost of increased die area. In this work, the level-shifting capacitors are eliminated by applying a split-capacitor feedback network [18, 23]. As shown in Fig.6(b), the input bias points for the NMOS and PMOS transistors are de-coupled by splitting the input and feedback capacitors.
into two equal pairs, which maintains the same mid-band gain $C_I / C_F$ and the same input impedance.

To maximize the output swing, the bias voltage of the inverter-based OTA should be properly defined. This is usually achieved with the aid of a DC control loop, in which a slow auxiliary amplifier keeps the output at the desired operating point [22]. However, such a DC control loop will recover too slowly from disturbances caused by the high-voltage pulses propagating across the ASIC during the transmit phase. Therefore, instead, we dynamically activate the bias control loop in synchronization with the transmit/receive (TX/RX) cycles of the ultrasound system, as shown in Fig.7. During the TX phase, the LNA is essentially auto-zeroed while the auxiliary amplifier drives the gate of the NMOS transistor so as to bias the output at mid-supply. During the RX phase, the auxiliary amplifier is disconnected, and both its inputs are shorted to the mid-supply. Meanwhile, the LNA starts receiving the echo signal by operating at the “memorized” bias points. Given that the typical TX/RX cycle in cardiac imaging is relatively short, ranging from 100 μs to 200 μs, the bias voltage hardly drifts during the RX phase. The relatively large sizes of the input transistors ($W/L_N = 75/0.2, W/L_P = 60/0.2$), needed for flicker-noise reduction, also help to keep the bias voltages stable.

A well-known downside of a single-ended inverter-based OTA is its poor power-supply-rejection ratio (PSRR) [24]. As the LNAs are closely integrated with high-frequency digital circuits for beamformer control, the supply line and the ground are inevitably noisy. To improve the PSRR, we generate two internal power rails within each sub-array by means of two regulators ($REG_P$ and $REG_N$ in Fig.8) that are shared by the 9 LNAs of a sub-array. Given the fact that the loading currents of these regulators are known and approximately constant, their implementation can be kept rather simple to save area. A capacitor-less LDO based on a super source-follower [25], capable of providing a PSRR better than 40 dB at 5 MHz, is adopted as the topology for both regulators.
Fig. 8 shows the complete schematic of the proposed LNA. The inverter-based OTA is cascaded to ensure an accurate closed-loop gain, and input transistors $M_1$ and $M_4$ are biased in weak-inversion to optimize their current-efficiency. The bias voltage of $M_1$, $V_{refP}$, which is derived from a diode-connected PMOS transistor via a high-impedance pseudo-resistor, is shared by the input gate of the positive-rail regulator $REGP$. Thus, the bias current of the OTA can be defined by the difference of the reference currents ($I_{p1} - I_{p2}$) and the dimension ratio of $M_1$ and $M_{p1}$. In each channel, an unity-gain-connected inverter, implemented with long-channel transistors and consuming only 0.4 $\mu$A, is connected between the two regulated power rails to generate a mid-supply reference that is approximately 900 mV. The auxiliary amplifier for DC bias control is realized as a simple differential pair. With a current consumption of less than 1 $\mu$A, it is capable to settle within the 10 $\mu$s TX phase. A switchable capacitive feedback network is implemented to provide the mentioned 3 gain levels for dynamic range enhancement.

B. Sub-Array Beamformer

Fig. 9 shows the circuit implementation and timing diagram of the sub-array beamformer. It consists of 9 programmable analog delay lines, each of which is built from pipeline-operated S/H memory cells that run at a sampling rate of 33 MHz, corresponding to the target delay resolution of 30 ns.

The outputs of all 9 delay lines are passively joint together to sum up and average the charge sampled on the capacitors that are connected to the output node [10]. Compared to voltage-mode summation [26, 27], this scheme eliminates the need for a summing amplifier, and is thus more compact and power-efficient. However, a potential source of errors is the residual charge stored on the parasitic capacitance at the output node, which causes a fraction of the output of the previous clock cycle to be added to the output signal. This is equivalent to an undesired first-order infinite-impulse-response low-pass filter. While this filtering can be
eliminated by periodically removing the charge from the output node using a reset switch [10], here we choose for the simpler solution of minimizing the parasitic capacitance at the output node. It can be shown that an acceptable signal attenuation within the bandwidth of 0-10 MHz of less than 3 dB is obtained if this parasitic is less than 20% of the total capacitance at the output node, which can be easily achieved with a careful layout.

The control logic for programming the delay lines is also integrated within each sub-array. Its core is a delay stage index rotator that determines the sequence in which the memory cells are used, as conceptually shown in Fig. 10. The detailed circuit implementation is shown in Fig. 11. It consists of an 8-stage shift register (D1-D8) in which the 4-bit binary indices of memory cells (1-8) are stored and rotated. Upon startup, register Dn is preset to n. D1 stores the index of the memory cell used for sampling the input signals, while D2-D8 store the indices of candidate memory cells for readout. A 3-bit selection code, provided by a built-in SPI interface, decides which of these candidates is used, allowing the delay depth of the individual delay line to be programmed. One-hot codes derived from the selected 4-bit binary indices are re-timed by non-overlapped clocks to control the sample/readout switches in the memory cells.

As mentioned in Section II, the SPI interfaces in all sub-arrays can be either loaded in parallel, or configured as a daisy-chain to load different delay-patterns to individual sub-arrays. With a 50 MHz SPI clock, only 0.54 μs is needed to program the ASIC’s delay pattern in the parallel mode, while for the daisy-chain mode it takes about 13 μs (sub-arrays in each quadrant of the ASIC form one daisy-chain), leading to a 9% frame rate reduction with an imaging depth of 10 cm. As such, the daisy-chain mode enables near-field focusing at the expense of a slightly slower frame rate.

C. Mismatch-scrambling
The S/H memory cells suffer from charge injection and clock feed-through errors, the mismatch of which introduces a ripple pattern with a period of 8 delay steps (240 ns) at the output of the delay lines. Such ripple pattern manifests itself as undesired in-band tones in the output spectrum of the beamformer, which limits the dynamic range of the signal chain.

To mitigate this interference, we propose a mismatch-scrambling technique by adding an extra memory cell and a redundant index register D9, as shown in both Fig. 10 and Fig. 11. A pseudo-random number generator (PRNG) embedded in each sub-array generates a bit sequence (PRBS) that decides whether the index of D8 or D9 shifts into D1, while the other index shifts into D9. Thus, memory cells are randomly taken out and inserted back into the sequence. This operation randomizes the ripple pattern and converts the interfering tones into broadband noise. The mismatch-scrambling function can be switched on/off with a control bit (MS_EN in Fig. 11).

The PRNG in each sub-array is implemented as a 12-bit Galois linear-feedback shift register (LFSR) [28]. It can be re-configured as a shift register to allow the sequential loading of its initial state, i.e. the seeds. Similar to the daisy-chain mode of the delay-pattern SPI interface, these shift registers can also be cascaded to allow different seeds to be loaded into the individual sub-arrays. Applying a set of randomized seeds for all sub-arrays is expected to further de-correlate the sequences of memory cell rotation on the scale of the full-array. As a result, the excess noise generated by the scrambling process can be suppressed when the output signals of the sub-arrays are combined by the beamforming operation in the imaging system, thus improving the SNR.

D. **PGA**

Fig. 12 shows the schematic of the PGA, which is implemented as a current-feedback instrumentation amplifier [17, 29] with a single-ended output. It consists of a differential pair of super source followers with a tunable source-degeneration resistor $R_S$, which performs as a
linearized trans-conductor, and a current mirror with a constant load resistor $R_L$, which converts the trans-conductor’s output current to voltage. The voltage gain of the PGA is defined by the ratio of both resistors $R_L/R_S$. $R_S$ is implemented as a switchable resistor array ranges from 6 kΩ to 18 kΩ, while $R_L$ is constant (24 kΩ). To avoid using very large CMOS switches for getting small on-resistance, Kelvin connections are used to eliminate errors caused by the on-resistance of those switches (Fig. 12). Compensation capacitors ($C_C$) are added to ensure the loop stability. A differential topology is applied to improve the PGA’s immunity to interference. The negative input terminal ($V_{in}$) is connected to the output of a replica delay-line buffer, whose input node is AC-coupled to ground while sharing the same DC bias voltage with the other buffers.

E. Cable driver

The cable driver is required to fan-out the output signal of each sub-array across a micro-coaxial cable with capacitance up to 300 pF. To maximize its power-efficiency, a class-AB super source follower [30], as depicted in Fig.13, is adopted as the topology for the cable driver. Instead of using a high-impedance pseudo-resistor to form a quasi-floating gate, the gate of the PMOS transistor is only connected to the bias circuit during the TX phase, but kept floating during the RX phase, similar to the dynamic DC bias scheme used in the LNA.

IV. EXPERIMENTAL RESULTS

The ASIC has been realized in a 0.18 μm low-voltage CMOS process with a total area of 6.1 × 6.1 mm², as shown in Fig. 14(a). Fig. 14(b) presents a zoom-in view of one sub-array receiver that is matched to a 3 × 3 group of transducer elements with a pitch of 150 μm. While receiving, the ASIC consumes only 230 mW, which is less than half of the power budget for a 3-D TEE probe.

Fig. 15(a) shows a fabricated prototype with an integrated 32 × 32 PZT matrix transducer. The assembly has been bonded to a daughter PCB to facilitate acoustic measurements (Fig.
A matching layer and ground foil are applied on top of the PZT matrix. Bonding wires on the periphery of the ASIC are covered by a non-conductive epoxy layer for waterproof.

The ASIC’s 96-channel sub-array outputs and 64-channel high-voltage transmit inputs are connected to a mother-PCB via micro-coaxial cables with a length of 1.5 m. The mother PCB is directly mounted on a programmable imaging system (Verasonics Vantage system, Verasonics Inc., Redmond, WA), which acquires the RF data from the ASIC and drives high-voltage pulses to transmit elements in the prototype transducer array. Counting in the required power-supply and digital control lines, the total number of cables required for connecting the ASIC to the imaging system is around 190.

Using this setup, the ASIC’s electrical and acoustic performance have been characterized experimentally, the results of which are presented in this section.

A. Electrical characterization

The electrical performance of the proposed LNA architecture has been fully characterized and evaluated with a separate test IC [18]. It demonstrates a 9.8 MHz bandwidth, an 81 dB dynamic range and an input-referred noise density of 5.5 nV/√Hz @ 5 MHz at its highest gain, while consuming only 0.135 mW per channel. When interfaced with an external, small PZT array that gives a receive sensitivity of about 10 μV/Pa, the LNA achieves a noise-efficiency factor (NEF) [31] that is 2.5 × better than the prior state-of-the-art [13].

Fig. 16 shows the measured transfer function of a 3 × 3 sub-array receiver in the ASIC, with a uniform delay pattern applied to the sub-array beamformer. Various combinations of LNA/PGA gain settings were applied to achieve a programmable mid-band gain ranging from –24 dB to 24 dB with a gain step of 6 dB. The measured absolute values of the gain levels are approximately 12 dB lower than the theoretical values of the LNA/PGA gain combinations, which can be attributed to the signal attenuation in the delay line buffers, cable drivers, and
the attenuation associated with the parasitic capacitance at the beamformer’s summing node. This deviation does not deteriorate the imaging quality, as long as an adequate SNR can be maintained at the sub-array output by an appropriate selection of gain settings.

To investigate the output noise level of the sub-array receiver circuits, we use an ASIC without integrated transducer matrix, in which all bond-pads for transducer interconnection are electrically shorted to ground by means of wire bonding. Fig. 17(a) shows the measured output noise spectrum without enabling the mismatch-scrambling function. The noise floor is in good agreement with expectations, but two interference tones appearing at fractions of the sampling frequency (f_s/8, f_s/4) dominate the noise floor and thus reduce the dynamic range. After enabling mismatch-scrambling (Fig. 17(b)), these tones get eliminated from the output spectrum, at the expense of a small increase in the noise floor.

The noise power reduction associated with the system-level beamforming has been measured by combining the sub-array output signals acquired using the Verasonics system. Fig. 18 shows the measured rms noise voltage after beamforming as a function of the number of sub-arrays. Ideally, if the noise at the outputs of the sub-arrays is uncorrelated, the noise power after beamforming should decrease inversely proportionally to the number of sub-arrays involved. Without mismatch-scrambling, this is not the case, because the sub-array outputs signals are dominated by (correlated) mismatch-related tones. With mismatch-scrambling enabled, the noise level shows the expected improvement, i.e. decreasing at a slope close to 10 dB/dec, provided that randomized seeds are delivered to the different pseudo-range number generators. With the same seed used in all sub-arrays, the tones disappear from the output spectrum, but the randomized mismatch signals of different sub-array are still correlated and hence are not reduced by the averaging operation of the system-level beamformer.
Table I summarizes the measured electrical performance of the ASIC. A system-level comparison with reported works on ASICs for 3-D ultrasound imaging is given in Table II. Our ASIC achieves both the best power-efficiency in receiving and the highest integration density.

B. Acoustic experiments

The fabricated prototype shown in Fig. 15 was immersed in a water tank (Fig. 19) for the evaluation of its acoustic performance. To measure the transmit efficiency of the center sub-array, all 64 TX elements were driven simultaneously by the Verasonics system and the pressure was measured at 5 cm using a hydrophone. With a 50 V excitation, a transmit pressure of 300 kPa was measured, leading to a transmit efficiency of about 6 kPa/V.

To characterize the receive beam-steering function of the ASIC, a single element transducer of 0.5 inch diameter and 5 MHz central frequency (Olympus) has been used as an external source, which generates a quasi-continuous plane wave at the surface of the prototype transducer. The prototype was mounted on a rotating stage and turned from -50° to +50° with a steps of 2°. The delays of sub-array beamformers in the ASIC were programmed successively to steer the sub-arrays maximum sensitivity towards 0°, -17° and -37°. The corresponding measured sub-array beam-profiles, shown in Fig. 20, are in good agreement with expectations, with the peak of the beams corresponding well to the programmed steering angles.

C. Imaging results

To demonstrate the 3-D imaging capability of the prototype, a pattern of seven point scatterers (six steel balls and one needle), forming a letter “M” (Fig. 21(a)), was placed at a distance of approximately 35 mm in front of the transducer array. A diverging wave was transmitted from the prototype, using a pulse of 18 V (peak-to-peak), generated by the
Verasonics systems and applied to the transmit sub-array through the connections on the ASIC. A 3-D volume image was re-constructed by combining the sub-array output signals recorded using the Verasonics system from multiple transmit-receive events, and rendered to get a frontal view of the point scatterers (Fig. 21(b)), which clearly shows the layout of the scatterers.

Currently, the 3-D image reconstruction has been done offline and 169 transmit-receive events were used to generate one volume as shown in Fig. 21(b). In the real-time scenario, this corresponds to a frame rate of 44.4 volumes per second with an imaging depth of 10 cm. When the daisy-chain mode for delay-pattern programming is enabled, the frame rate reduces to about 40 volumes per second. We have also noted that volumes can be reconstructed from at minimum 25 transmit-receive events, at the cost of slightly degraded image quality. This results in a frame rate of 300 volumes per second in the fast imaging mode.

V. CONCLUSIONS

A front-end ASIC with an co-integrated 32 × 32 PZT matrix transducer has been design and implemented to enable next-generation miniature ultrasound probes for real-time 3-D transesophageal echocardiography. The transducer array is split into a transmit and a receive sub-array to facilitate the power and area optimization of the ASIC. To address the critical challenge of cable-count reduction, sub-array receive beamforming is realized in the ASIC with a highly-compact and power-efficient circuit-level implementation, which utilizes the mismatch-scrambling technique to optimize the dynamic range. A power- and area-efficient LNA architecture is proposed to further optimize the performance. Based on these techniques, the ASIC demonstrates state-of-the-art power and area efficiency, and has been successfully applied in 3-D imaging experiments.

ACKNOWLEDGMENT

This research is supported by the Dutch Technology Foundation STW, which is part of the
Netherlands Organisation for Scientific Research (NWO), and which is partly funded by the Ministry of Economic Affairs.

REFERENCES


LIST OF FIGURES

Fig. 1: A miniature 3-D TEE probe with a front-end ASIC.

Fig. 2: Transducer matrix configuration.

Fig. 3: Schematic of the $3 \times 3$ sub-array receiver.

Fig. 4: The measured impedance of a $150 \, \mu m \times 150 \, \mu m$ PZT transducer element and its equivalent electrical model.

Fig. 5: The proposed LNA architecture.

Fig. 6: Inverter-based OTA with split-capacitor feedback network.

Fig. 7: Dynamic bias control scheme.

Fig. 8: Complete schematic of the LNA.

Fig. 9: Schematic and timing-diagram of the sub-array beamformer.

Fig. 10: Operation principle of mismatch-scrambling.

Fig. 11: Circuit implementation of the delay line control logic with mismatch-scrambling.

Fig. 12: Schematic of the PGA.

Fig. 13: Schematic of the cable driver.

Fig. 14: (a) Micro-photograph of the ASIC; (b) Floor plan of one sub-array receiver. Bond-pads for transducer interconnection are implemented on top of the other circuits in the top layer metal.

Fig. 15: (a) Photograph of a prototype ASIC with integrated $32 \times 32$ PZT matrix; (b) A prototype bonded on a daughter PCB for acoustic experiments.

Fig. 16: Measured transfer functions of the ASIC with different combinations of LNA/PGA gain settings.

Fig. 17: Measured noise spectrum of the summed output of 24 sub-arrays without (left) and with (right) mismatch-scrambling. LNA gain = 6 dB and PGA gain = 6 dB.

Fig. 18: Measured rms noise voltage after post-beamforming as a function the number of sub-arrays. Noise is integrated over a bandwidth of 2.5 MHz - 7.5 MHz. LNA gain = 6 dB and PGA gain = 6 dB.

Fig. 19: Schematic diagram of the acoustic experiment setup. For the beam-steering measurements and the characterization of transmit pressure, scatterers were replaced by single-element transducers and a hydrophone, respectively.
Fig. 20: Measured sub-array beam-profile for steering angles of 0°, 17° and 37°.

Fig. 21: (a) The pattern of 7 point scatterers including 6 steel balls (gray circles) and 1 needle (the dotted circle); (b) volume-rendered 3-D image. Minor side-lobes appear around the original echo of each point scatterer, which are related to the layout of transducer elements.
Fig. 1 A miniature 3-D TEE probe with a front-end ASIC.

Fig. 2 Transducer matrix configuration.
Fig. 3 Schematic of the $3 \times 3$ sub-array receiver.

Fig. 4 The measured impedance of a $130 \, \mu m \times 130 \, \mu m$ PZT transducer element (i.e. $150 \, \mu m$ pitch with $20 \, \mu m$ kerf) and its equivalent electrical model.
Fig. 5 The proposed LNA architecture.

Fig. 6 Inverter-based OTA with split-capacitor feedback network.

Fig. 7 Dynamic bias control scheme.
Fig. 8 Complete schematic of the LNA.

Fig. 9 Schematic and timing-diagram of the sub-array beamformer.
Fig. 10 Operation principle of mismatch-scrambling.

Fig. 11 Circuit implementation of the delay line control logic with mismatch-scrambling.
Fig. 12 Schematic of the PGA.
Fig. 13 Schematic of the cable driver.

Fig. 14 (a) Micro-photograph of the ASIC; (b) Floor plan of one sub-array receiver. Bond-pads for transducer interconnection are implemented on top of the other circuits in the top layer metal.

Fig. 15 (a) Photograph of a prototype ASIC with integrated 32 × 32 PZT matrix; (b) A prototype bonded on a daughter PCB for acoustic experiments.
Fig. 16 Measured transfer functions of the ASIC with different combinations of LNA/PGA gain settings.

Fig. 17 Measured noise spectrum of the summed output of 24 sub-arrays without (a) and with (b) mismatch-scrambling. LNA gain = 6 dB and PGA gain = 6 dB.
Fig. 18 Measured rms noise voltage after post-beamforming as a function of the number of sub-arrays. Noise is integrated over a bandwidth of 2.5 MHz - 7.5 MHz. LNA gain = 6 dB and PGA gain = 6 dB.

Fig. 19 Schematic diagram of the acoustic experiment setup. For the beam-steering measurements and the characterization of transmit pressure, scatterers were replaced by single-element transducers and a hydrophone, respectively.
Fig. 20 Measured sub-array beam-profile for steering angles of 0°, 17° and 37°.

Fig. 21 (a) The pattern of 7 point scatterers including 6 steel balls (gray circles) and 1 needle (the dotted circle); (b) volume-rendered 3-D image. Minor side-lobes appear around the original echo of each point scatterer, which are related to the layout of transducer elements.
LIST OF TABLES

I. ASIC performance summary
II. System-level comparison with prior works
TABLE I. ASIC PERFORMANCE SUMMARY

<table>
<thead>
<tr>
<th>RX</th>
<th>Supply voltage</th>
<th>Analog: 1.8 V</th>
<th>Digital: 1.4 V</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Total power</td>
<td>228.9 mW</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Analog: 190 mW</td>
<td></td>
<td>Digital: 38.9 mW</td>
</tr>
<tr>
<td></td>
<td>-3 dB Bandwidth</td>
<td>6 MHz</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Input-referred</td>
<td>w/o mismatch-scrambling: 1.8 mPa/√Hz with mismatch-scrambling: 3.6 mPa/√Hz (worst case*)</td>
<td></td>
</tr>
<tr>
<td></td>
<td>noise density</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>@ 5 MHz</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>RX sensitivity</td>
<td>~ 5 µV/Pa @ LNA input</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Gain steps</td>
<td>-12/-6/0/6/12/18/24/30/36 dB</td>
<td></td>
</tr>
<tr>
<td></td>
<td>HD2</td>
<td>43 dBc @ 300 mV peak-to-peak output, 5 MHz</td>
<td></td>
</tr>
<tr>
<td>TX</td>
<td>Max. peak-to-</td>
<td>50 V</td>
<td></td>
</tr>
<tr>
<td></td>
<td>peak TX pulse</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>voltage</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>TX efficiency</td>
<td>~6 kPa/V @ 5 cm</td>
<td></td>
</tr>
</tbody>
</table>

*The measured input-referred noise with the mismatch-scrambling function enabled varies with different delay patterns because of a systematic mismatch in the layout of S/H delay lines, which could be optimized by a better layout.

TABLE II. SYSTEM-LEVEL COMPARISON WITH PRIOR WORKS

<table>
<thead>
<tr>
<th></th>
<th>[5]</th>
<th>[12]</th>
<th>[9]</th>
<th>[10]</th>
<th>This work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
<td>1.5 µm HV</td>
<td>0.25 µm HV</td>
<td>0.18 µm HV</td>
<td>0.18 µm LV</td>
<td>0.18 µm LV</td>
</tr>
<tr>
<td>Transducer</td>
<td>CMUT</td>
<td>CMUT</td>
<td>CMUT</td>
<td>PZT</td>
<td>PZT</td>
</tr>
<tr>
<td>Array size</td>
<td>16 × 16</td>
<td>32 × 32</td>
<td>16 × 16</td>
<td>9 × 12</td>
<td>32 × 32</td>
</tr>
<tr>
<td>Center freq.</td>
<td>5 MHz</td>
<td>5 MHz</td>
<td>5 MHz</td>
<td>5 MHz</td>
<td>5 MHz</td>
</tr>
<tr>
<td>Element Pitch</td>
<td>250 µm</td>
<td>250 µm</td>
<td>250 µm</td>
<td>200 µm</td>
<td>150 µm</td>
</tr>
<tr>
<td>Pitch ≤ λ/2?</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>Beamform Function</td>
<td>TX</td>
<td>TX</td>
<td>Off-chip</td>
<td>RX Sub-array</td>
<td>RX Sub-array</td>
</tr>
<tr>
<td># of TX el.</td>
<td>256</td>
<td>960</td>
<td>256</td>
<td>N/A</td>
<td>64</td>
</tr>
<tr>
<td># of RX el.</td>
<td>32</td>
<td>64</td>
<td>256</td>
<td>81</td>
<td>864</td>
</tr>
<tr>
<td>Imaging depth</td>
<td>7.5 cm</td>
<td>7.5 cm</td>
<td>7.5 cm</td>
<td>N/A</td>
<td>10 cm</td>
</tr>
<tr>
<td>Volume rate</td>
<td>5 volume/s</td>
<td>5 volume/s</td>
<td>62.5 volume/s</td>
<td>N/A</td>
<td>40 – 300 volume/s</td>
</tr>
<tr>
<td>Integration method</td>
<td>Flip-chip bonding Via Interposer</td>
<td>Flip-chip bonding Via Interposer</td>
<td>Direct Integration</td>
<td>Direct Integration</td>
<td></td>
</tr>
<tr>
<td>ASIC size</td>
<td>10 × 6 mm²</td>
<td>9.2 × 9.2 mm²</td>
<td>6 × 5.5 mm²</td>
<td>3.2 × 3.8 mm²</td>
<td>6.1 × 6.1 mm²</td>
</tr>
<tr>
<td>RX power/el.</td>
<td>9 mW</td>
<td>4.5 mW</td>
<td>1.4 mW</td>
<td>0.44 mW</td>
<td>0.27 mW</td>
</tr>
</tbody>
</table>

*Limited by the PCIe interface between Verasonics and the host PC.