An Ultrasound Receiver ASIC Employing Compressive Sensing

by Farzad Mirzaei
An Ultrasound Receiver ASIC Employing Compressive Sensing

By

Farzad Mirzaei

to obtain the degree of Master of Science in Microelectronics at the Delft University of Technology, to be defended publicly on Thursday September 27th, 2018.

Student Number: 4633806
Thesis committee:
- dr.ir. M.A.P. Pertijs
- prof.dr.ir. G.J.T. Leus
- dr. F. Sebastiani
- dr. P. Kruizinga
- Ir. Z. Chen

TU Delft, supervisor
TU Delft
TU Delft
Erasmus MC
TU Delft

An electronic version of this thesis is available at http://repository.tudelft.nl/.
Acknowledgement

I would like to thank a number of the people that helped and supported me through this one year journey.

My gratitude goes first and foremost to my supervisor Dr. Michiel Pertijs, an outstanding person, mentor, and tremendously talented scientist who gave me the opportunity to work in his group and learn from him not only to become a better engineer but also to grow as an individual.

I would like to especially express my gratitude to my daily supervisor Zhao Chen who helped me endlessly throughout this work. He would always make time for my questions and helped me with sheer patience. He made numerous insightful remarks on my work and offered a lot of help throughout the course of this project. If it had not been for him, I would not be able to finish my work.

I would like to like to thank Javad in particular, an excellent friend and a good engineer who helped me to a great extent during this year.

My gratitude also goes to Pieter who helped me a great deal during my work and also Pim who invested a lot of time in answering my questions and assisted me in writing this thesis.

A special thanks goes to my sincere and brilliant friend Said, who has assisted me from time to time. Despite being engaged most of the time, he would always offer his help.

I would like to acknowledge Mingliang who helped me every now and then throughout the project. I’d like to also thank the friendly people of the El Lab especially Jing, Douwe, Rushil, Hui, Cristian, and Valeria, Zu Yao, and Lukasz.

Finally, I would like to thank my parents and my dear uncle who supported me during the two years of master studies in each and every aspect. Had it not been for their utter and absolute support, this journey was impossible.

Farzad Mirzaei
Delft, September 2018
# Table of Contents

ACKNOWLEDGEMENT .................................................................................................................................................. I

TABLE OF CONTENTS ............................................................................................................................................... II

LIST OF FIGURES ...................................................................................................................................................... IV

LIST OF TABLES ........................................................................................................................................................ VI

ABSTRACT .................................................................................................................................................................... VII

1. INTRODUCTION .................................................................................................................................................... 3
   1.1. BACKGROUND .................................................................................................................................................. 2
   1.2. PRIOR ART ................................................................................................................................................... 3
   1.3. DESIGN OBJECTIVE ........................................................................................................................................ 3
   1.4. THESIS ORGANIZATION ............................................................................................................................ 5

2. INTRODUCTION: COMpressive SENSING ........................................................................................................... 7
   2.1. COMpressive SENSING .............................................................................................................................. 8
   2.2. COMpressive SENSING APPLICATION IN ULTRASOUND ......................................................................... 8
   2.3. COMpressive SCHEME IN ULTRASOUND HARDWARE DESIGN .......................................................... 9
   2.4. CONCLUSION ................................................................................................................................................ 15

3. SYSTEM DESIGN ................................................................................................................................................... 17
   3.1. PZT BEHAVIOR ............................................................................................................................................. 18
   3.2. SYSTEM SPECIFICATIONS ........................................................................................................................ 19
   3.1. ARCHITECTURE DESIGN ............................................................................................................................ 21
       3.1.1. Charge summation ................................................................................................................................... 22
       3.1.2. Voltage summation ................................................................................................................................. 22
       3.1.3. Current summation .................................................................................................................................. 23
   3.2. DESIGNED SYSTEM ...................................................................................................................................... 23
       3.2.1. LNA Architecture .................................................................................................................................. 24
       3.2.2. Trans-conductance amplifier Architecture .......................................................................................... 25
       3.2.3. TIA Architecture .................................................................................................................................. 29
       3.2.4. ADC Architecture ................................................................................................................................... 32
   3.3. COMpressive SCHEME IMPLEMENTATION ............................................................................................ 32
       3.3.1. LFSR Design ......................................................................................................................................... 32

4. CIRCUIT IMPLEMENTATION AND SIMULATION RESULTS ................................................................. 35
   4.1. LNA ................................................................................................................................................................. 36
       4.1.1. Circuit implementation ............................................................................................................................ 36
       4.1.2. Layout ................................................................................................................................................... 37
       4.1.3. Simulations results ................................................................................................................................. 38
   4.2. TRANS-conductance AMPLIFIER ................................................................................................................. 41
       4.2.1. Circuit implementation ............................................................................................................................ 41

Table of Contents
4.2.2. Layout ................................................................................................................. 41
4.2.3. Simulation results ............................................................................................... 43
  4.2.3.1. First gain setting (5kΩ resistor) ................................................................. 43
  4.2.3.2. Second gain setting (1kΩ resistor) ............................................................ 44
4.3. TRANS-IMPEDANCE AMPLIFIER (TIA) ............................................................... 47
  4.3.1. Circuit implementation .................................................................................... 47
  4.3.2. Layout ............................................................................................................. 47
  4.3.3. Simulation results ........................................................................................... 48
4.4. SYSTEM SIMULATIONS ......................................................................................... 48
4.5. POWER BREAKDOWN ........................................................................................... 51
5. CHIP-LEVEL DESIGN ............................................................................................... 57
  5.1. CURRENT GENERATION ................................................................................... 58
  5.2. BINARY-TO-ONE-HOT ENCODER .................................................................. 58
  5.3. ADC VOLTAGE REFERENCE ........................................................................... 59
  5.4. CLOCK ................................................................................................................ 59
    5.4.1. Planning and unit design ................................................................................ 59
    5.4.2. Generation ..................................................................................................... 60
    5.4.3. Output latch and multiplexing ....................................................................... 61
    5.4.4. Distribution ................................................................................................... 62
    5.4.5. Non-overlapping clock .................................................................................. 63
  5.5. SHIFT REGISTER PROGRAMMING .................................................................... 63
  5.6. POWER DOMAINS .............................................................................................. 64
  5.7. FLOOR PLAN AND FULL CHIP ....................................................................... 64
  5.8. INTERCONNECTIONS ......................................................................................... 67
  5.9. TEST CHIP MALFUNCTION AND SOLUTION ................................................... 67
  5.10. EXPERIMENTAL PLANNING ....................................................................... 67
  5.11. CONCLUSION ................................................................................................... 68
6. CONCLUSION ............................................................................................................. 69
  6.1. THESIS CONTRIBUTION ................................................................................... 70
  6.2. FUTURE WORK ................................................................................................. 71
LIST OF ABBREVIATIONS .......................................................................................... 73
REFERENCES ............................................................................................................... 75
List of Figures

FIGURE 1.1 TRANSTHORACIC ECHOCARDIOGRAPHY (TTE) [4] ........................................................................ 2
FIGURE 1.2 (A)TRANSESOPHAGEAL ECHOCARDIOGRAPHY (TEE) [9] (B) INTRACARDIAC ECHOCARDIOGRAPHY [10] .... 3
FIGURE 1.3 SYSTEM OVERVIEW ............................................................................................................. 4
FIGURE 2.1 SIGNAL ACQUISITION WITH AND WITHOUT COMPRESSIVE MASK ........................................ 9
FIGURE 2.2 ULTRASOUND CODED MASK.................................................................................................. 9
FIGURE 2.3 ARCHITECTURE 1 OF GROUPING [15] ...................................................................................... 10
FIGURE 2.4 ARCHITECTURE 2 OF GROUPING [12] ................................................................................... 11
FIGURE 2.5 ARCHITECTURE 3 OF GROUPING [12] ................................................................................... 11
FIGURE 2.6 ORIGINAL CYST TO BE RECONSTRUCTED [13] ........................................................................ 12
FIGURE 2.7 PHANTOM RECONSTRUCTION USING 1 PULSE-ECHO MEASUREMENT [17] ............................ 13
FIGURE 2.8 PHANTOM RECONSTRUCTION USING 2 PULSE-ECHO MEASUREMENTS [17] ...................... 13
FIGURE 2.9 POINT REFLECTORS RECONSTRUCTION – 2PULSE ECHO MEASUREMENTS [17] .................. 14
FIGURE 2.10 COMPRESSIVE SYSTEM ARCHITECTURE FOR 32 ELEMENTS ........................................... 15
FIGURE 3.1 "BUTTERWORTH-VAN DYKE" PZT MODEL ............................................................................ 18
FIGURE 3.2 "BUTTERWORTH-VAN DYKE"PZT MODEL(150mm×1500mm) AND EQUIVALENT CIRCUIT AT RESONANCE... 18
FIGURE 3.3 TRANSDUCER IMPEDANCE MAGNITUDE .............................................................................. 19
FIGURE 3.4 SYSTEM ARCHITECTURE ..................................................................................................... 21
FIGURE 3.5 32X1 LINEAR ARRAY FLOORPLAN ......................................................................................... 22
FIGURE 3.6 SIGNAL CHAIN ARCHITECTURE FOR A GROUP OF 8 .......................................................... 23
FIGURE 3.7 LNA BLOCK DIAGRAM........................................................................................................... 24
FIGURE 3.8 INVERTER-BASED LNA .......................................................................................................... 24
FIGURE 3.9 OTA NOISE TRANSFER FUNCTION ......................................................................................... 25
FIGURE 3.10 NTF IN THE PRESENCE OF PARASITIC CAPACITANCE ......................................................... 25
FIGURE 3.11 A COMMON-SOURCE(VF) .................................................................................................... 26
FIGURE 3.12 VOLTAGE FOLLOWER WITH SERVO LOOP ........................................................................ 27
FIGURE 3.13 CASEDDED FLIPPED VOLTAGE FOLLOWER(CASFVF) ........................................................ 27
FIGURE 3.14 DIFFERENTIAL TRANS-CONDUCTANCE AMPLIFIER WITH KELVIN CONNECTIONS .......... 28
FIGURE 3.15 TRANS-CONDUCTANCE AMPLIFIER ................................................................................... 28
FIGURE 3.16 COPY CELLS AND CHOPPER SWITCHES ............................................................................ 29
FIGURE 3.17 TIA WITH FEEDBACK NETWORK .......................................................................................... 30
FIGURE 3.18 TIA AND PARASITIC CAPACITANCES .................................................................................. 30
FIGURE 3.19 TELESCOPIC CASEDDE AND INVERTER BASED CONFIGURATIONS.................................. 31
FIGURE 3.20 TIA FEEDBACK NETWORK ................................................................................................... 31
FIGURE 3.21 RANDOM SEQUENCE GENERATOR ..................................................................................... 33
FIGURE 3.22 10-BIT LFSR HISTORAM ....................................................................................................... 33
FIGURE 3.23 10-BIT LFSR SCATTER PLOT ................................................................................................. 34
FIGURE 4.1 LNA CONFIGURATION ............................................................................................................ 36
FIGURE 4.2 DC SERVO LOOP FOR LNA .................................................................................................... 37
FIGURE 4.3 LNA LAYOUT ............................................................................................................................. 38
FIGURE 4.4 FREQUENCY RESPONSE OF LNA (PRE-LAYOUT SIMULATION) ........................................ 39
FIGURE 4.5 FREQUENCY RESPONSE OF LNA (POST-LAYOUT) ............................................................ 39
FIGURE 4.6 STEP RESPONSE OF LNA (PRE-LAYOUT) ............................................................................ 40
FIGURE 4.7 STEP RESPONSE OF LNA(PRE-LAYOUT) ............................................................................. 40
FIGURE 4.8 TRANS-CONDUCTANCE STAGE .......................................................................................... 41
FIGURE 4.9 TRANS-CONDUCTANCE STAGE LAYOUT ............................................................................ 42
List of Tables

**TABLE 1.1** SIMILAR DESIGN COMPARISON TABLE ................................................................. 4
**TABLE 2.2** CNR FOR DIFFERENT METHODS ............................................................................ 14
**TABLE 3.1** TRANSDUCER MODEL COMPONENT VALUES ....................................................... 18
**TABLE 3.2** OVERALL SPECIFICATIONS .................................................................................. 20
**TABLE 4.1** LNA TRANSISTOR SIZES ..................................................................................... 37
**TABLE 4.2** NOISE PERFORMANCE OF THE LNA (RMS VALUES) ........................................ 38
**TABLE 4.3** THD VALUES FOR PRE AND POST- LAYOUT SIMS .............................................. 41
**TABLE 4.4** THD VALUES FOR PRE AND POST- LAYOUT SIMULATIONS (5KΩ) ...................... 44
**TABLE 4.5** TIA NOISE REPORT ............................................................................................ 48
**TABLE 4.8** THD VALUES FOR PRE AND POST- LAYOUT SIMULATIONS OF TIA ............... 51
**TABLE 5.1** CGC UNIT TRUTH TABLE .................................................................................... 60
**TABLE 5.2** CHIP POWER DOMAINS .................................................................................... 64
Abstract

This work introduces an architecture that is capable of reducing the number of cables coming out of an ultrasound receiver ASIC by a substantial factor without dropping the frame-rate. It employs a newly developed technique named compressive sensing to exploit the ultrasound signal redundancies in the spatial domain.

There are 32 receive paths of which the signal is amplified, multiplied by a random weight, summed in groups of 8 elements and digitized using 4 charge-sharing SAR ADCs. A 100MHz clock is used on the chip to time-multiplex the outputs of the 4 ADCs on a 10-bit parallel output. The ASIC mainly consists of three parts: (1) a low noise amplifier and trans-conductor, (2) a summation node and ADC, (3) and the digital programming circuitry and control signals.

The AFE consumes 1.1 mW power per channel and 1.5 mW power per channel including the SAR ADC power consumption. The received signal has a center frequency of 5MHz with a 50% bandwidth and it is being sampled at a rate of 25MHz.

A prototype chip has been fabricated in TSMC 0.18µm LV technology. Post-layout simulation results of this chip are presented in this thesis. The design is element-matched to a linear array of 32 PZT elements with 150µm pitch. The chip is rectangular shaped with dimensions of 5mm × 1mm.

Keywords: Compressive sensing, Cable count reduction, Current summation, Random-weighting Trans-conductance amplifier
1. Introduction
1.1. Background

Real-time high-resolution ultrasound imaging is nowadays a powerful tool for diagnostics of heart diseases. According to a report by the US Department of Health and Human Service [1], in 2015, heart disease was the leading cause of the death. All these heart-related conditions require an adequate monitoring for preventive medical care and disease treatment.

One of the earliest ways to diagnose heart conditions is electrocardiography (ECG). ECG is a non-invasive way to monitor the heart by placing electrodes on the skin and recording the heart activity [2]. Echocardiography is another method that employs ultrasound waves to create an image of the heart and also assess the blood flow through the heart [3].

Figure 1.1 Transthoracic Echocardiography (TTE) [4]

Figure 1.2 shows an image of Transthoracic Echocardiography (TTE). In this type of imaging, a probe is placed on the chest and the ultrasound wave has to travel across the skin, ribcage, pass through the lungs and reach the heart and then come to this path back to the probe [5]. In order to get more information from ultrasound images, Transesophageal Echocardiogram (TEE) probes can be used in which the patient has to swallow a miniature ultrasound probe [6].

Figure 1.2 shows two instances of endoscopy- or catheter-based probes for monitoring of the heart. You can see in Figure 1.2(a) a TEE probe that enters the esophagus, and reaches the backside of the heart and Figure 1.2(b) shows an Intracardiac echocardiography (ICE) probe that goes to the heart and gives higher quality images than TEE probes [7]. There are other types of catheter-based probes that are used for medical purposes such as Intravascular ultrasound (IVUS) probes that is mainly used to visualize the arteries of the heart [8]. Conventional probes use a linear array of transducer elements to make a 2D cross-sectional image. In order to do 3D imaging, a matrix transducer is required which typically consists of 1000+ elements. These elements cannot be connected using individual cables and hence call for the integration of an ASIC in the probe tip.
There are several limits and constraints regarding this type of imaging. These are invasive (however minimal) kinds of probing that require the catheter to enter the human body through a channel. The power consumption should not increase higher than a limit as it increases the environment temperature and surrounding tissues and organs; if it increases more than a threshold it can cause tissue damages. On the other hand, the catheter diameter is directly affected by the number of cables that are connected to the ASIC and especially for 3D imaging where there is a matrix of transducers it becomes a bottleneck. Hence, the main challenges in designing an ASIC for ultrasound imaging are power budget and cable count reduction.

1.2. Prior art

There are a number of existing works that try to reduce the number of cables connected to the ASIC. In [11], wire reduction happens through sub-array beamforming. It uses a delay-and-sum beamforming scheme for 9 adjacent receive elements to reduce the amount of transmitted data. The design presented in [12] is different to some extent; there are only 4 cables connected to the ASIC. In each pulse-echo cycle, only one element can receive and the data is fed out. However, in [13] [14], a time division multiplexing approach has been chosen. In [13], a high-frequency clock (200MHz) is fed to the chip and this enables to sample 8 25MS/s channels.

1.3. Design objective

All of the prior art works discussed above essentially sacrifice the frame rate. This design, however, aims to reduce the cable count without dropping the frame rate. As a result, a new way of data compression was adopted in this work which was developed by X. Li, a former TU Delft Master student in [15] and P. van der Meulen in [16] [17]. A summary of their work and its results will be presented in chapter 2 of this thesis. This method named as Compressive Sensing promises to create an 8 to 1 cable count reduction without losing frame rate. A comparison has been made in Table 1.1 between the similar designs which aim to reduce the cable count.
Table 1.1 Similar design comparison table

<table>
<thead>
<tr>
<th></th>
<th>[11]</th>
<th>[12]</th>
<th>[13]</th>
<th>This work (Target)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Process</strong></td>
<td>0.18µm LV CMOS</td>
<td>0.18µm HV CMOS</td>
<td>0.18µm HV CMOS</td>
<td>0.18µm LV CMOS</td>
</tr>
<tr>
<td><strong>Cable reduction scheme</strong></td>
<td>Sub-Array Beam Forming</td>
<td>Receive Cycle Multiplexing</td>
<td>Time Division Multiplexing</td>
<td>Compressive sensing</td>
</tr>
<tr>
<td><strong>Center frequency</strong></td>
<td>5MHz</td>
<td>10MHz</td>
<td>7MHz</td>
<td>5MHz</td>
</tr>
<tr>
<td><strong>Transducer</strong></td>
<td>PZT</td>
<td>PZT</td>
<td>PZT/CMUT</td>
<td>PZT</td>
</tr>
<tr>
<td><strong>#Receive</strong></td>
<td>864</td>
<td>64</td>
<td>64</td>
<td>32</td>
</tr>
<tr>
<td><strong>Cable reduction ratio/#pulse-echo acquisitions</strong></td>
<td>9x/25x(^1)</td>
<td>64x/64x</td>
<td>8x/1x</td>
<td>8x/1x</td>
</tr>
<tr>
<td><strong>Power consumption /channel</strong></td>
<td>0.27mW</td>
<td>10mW(^\d)</td>
<td>6.26mW</td>
<td>&lt;1mW</td>
</tr>
</tbody>
</table>

\(^1\) The number of acquisitions is shown in [48]

\(^\d\) This number represents the power consumption per receive cycle for the front end and ADC.

All of the other works mentioned in the above table have a digitized output except for [13]. This work also targets to digitize the output signal before feeding it out.

A schematic of the proposed system can be seen in Figure 1.3 where 32 cables from the 32 receive elements enter and only 4 wires leave the chip.

![Figure 1.3 System overview](image-url)
1.4. Thesis organization

The thesis organizes as follows.

In the second chapter, an introduction is given about compressive sensing theory and the scheme that is designed for the hardware implementation. Furthermore, a practical system has been simulated and its results are presented. In the third chapter, the system specifications are derived, and system blocks are chosen. Each block is assigned a proper design. The rest of the auxiliary circuits are also presented here.

Chapter four mainly focuses on the result presentation and verifications. Each block has been fully simulated and the results are matched to those of the required specs in chapter three. The block level lay outs are also included in this chapter. Later on, chapter five describes the elements that are considered in the chip-level design. This chapter also features photos from the chip.

Finally, chapter six concludes this thesis and shares some suggestions for the prospective projects.
2. Introduction: Compressive Sensing
Ultrasound systems generate and transmit a lot of data; most of which might be redundant and can be discarded. A lot of sparsity can be found in both time and spatial domain of ultrasonic waves. This, alongside new research topics in compressive sensing, triggers the adoption of an economic way in which the amount of data that is collected and processed can be reduced.

Typical signal reconstruction approaches use well known Shannon’s theorem; the sampling rate must be at least equal to or greater than twice the maximum frequency in the signal [18]. However, recently there has been a theory developed, named compressive sampling which requires far less amount of data for signal reconstruction than what is required in typical Nyquist rate sampling. This enables a faster imaging. Lately, some investigations have been conducted some into exploiting compressive sensing in medical imaging.

Here is a short introduction to compressive sampling and how it has been linked to medical imaging.

2.1. Compressive sensing
Compressive Sensing (CS) or compressive sampling states that under certain assumptions a signal can be reconstructed from far fewer samples than what Shannon’s theorem states [19] [20]. These assumptions are

a) Sparsity: which relates to the sampled signal
b) Incoherence: which concerns the sensing method

A signal can be sparse if represented in a proper basis ($\Psi$) [20]; for instance, it could be represented in the frequency domain (Fourier series of the signal) and become dense. A Dirac function is sparse in the time domain while it occupies the full frequency spectrum. While a signal should have a sparse representation in some basis, it should not be sparse in the sampling domain [21]; there should exist incoherence between the two domains. That is due to the fact that each sample should have all of the information required to reconstruct the signal.

2.2. Compressive sensing application in ultrasound
Compressive sensing has inspired a lot of work recently especially in optics [21] [22] [23] [24]. Ultrasound waves acquire the image using the same techniques. Hence, one can also implement compressive sensing into ultrasound hardware. A concise study of the approaches towards using this technique in ultrasound can be found in [25], and also it investigates sparsifying domains and tries to find the appropriate domain in which error is minimized.

A novel work by P. Kruizinga et al. [26] has incorporated a similar approach that has been used in single pixel imaging [23]. Named compressive sensing, it creates randomness in received signals, while it does not necessarily use compressive sampling that has been introduced by E. J. Candès [19], which is earlier described at the beginning of this section; in this method the compression actually happens before sampling [27] where the received signals are differentiated using a simple pixel.

As can be seen in Figure 2.1, two signals are reflected from two different points. On the right side of the figure, a pixel without a mask receives the sum of the two reflections. There is little contrast between the signals, hence one cannot distinguish among the objects. However, on the left side of Figure 2.1, each signal is received on a slightly different point on the mask and sees a different thickness. Differing thicknesses result in different time delays applied to each signal, and if the delays of the mask are random enough, one can reconstruct the image with fewer samples.
Figure 2.1 Signal acquisition with and without compressive mask

Taking a closer look at the mask (See Figure 2.2) [28], each segment has a different thickness. In this figure, A denotes the weight that is multiplied to the received signal on each path which is a function of t, t represents the time that the wave travels from the edge of the mask to the end of it, and x represents the ultrasound signal. It means that ultrasound waves experience different paths to propagate through this mask. As each path gets longer, the signal sees a longer delay, and the delay increases linearly with thickness. The output signal is the sum of all these responses.

\[
y_{\text{total}} = \sum A_i(t_i)x
\]

As these thicknesses are chosen randomly, reduces the similarities between the signals which is beneficial in the reconstruction of the image. Take a look at [26] to see the two letters “D” and “E” which were reconstructed using a single transducer and a plastic coding mask.

This relatively short introduction explained how compressive sampling inspired several works in optics. Later on, the same methods were adopted in ultrasound imaging with an important difference; data was only compressed before the sampling and not during the sampling.

From now on throughout this thesis, the term compressive sensing is used to refer to the method introduced by P. Kruizinga [26] and not the theorem that was introduced by E. J. Candès [19].

2.3. Compressive scheme in ultrasound hardware design

Ultrasound systems typically have a large number of connections. A linear array of elements are used in order to acquire 2D images and a matrix of transducer elements can be used for 3D imaging. A linear array of transducers could consist of as much as 30 elements and the matrix
transducer typically have the square of this value number of transducers (>900). The connections of this amount of elements can become a challenge in the ultrasound hardware design and it should be addressed with a proper cable count reduction solution. There are a few approaches towards reducing the cable count that was introduced in chapter 1. In this chapter, however, the aim is to adopt the way that is described in the previous section in ultrasound imaging; we aim to replace a single element with a linear array of transducer elements and implement the mask using electronic hardware.

Implementing CS into hardware requires feasible and realistic design. This has been the thesis subject of Xuyang Li, a former master student in the CAS group of TU Delft [15]. His thesis goal was to design a feasible imaging scheme, certify the design method in the presence of noise, and assess the performance of the system with respect to uncompressed sensing.

X. Li has investigated a relatively wide range of architectures, showing that some approaches are more effective than others. Complementary to X. Li’s work, P. van der Meulen has carried out additional simulations and narrowed down his research on two specific architecture choices [16][17]. In this section, a brief summary of P. van der Meulen’s key findings alongside the course that X. Li has set during his thesis will be presented.

In primary simulations, three methods are applied to add randomness to the signals: a) Sample shift, b) phase shift, and c) amplitude gain. The signals are shifted sample-wise on a random basis in discrete levels, shifted randomly in the phase domain, or multiplied by a random gain weight respectively. Also, 64 receive elements have been considered to send plane waves and receive the reflections. The receive channels are shrunk to 8 paths.

The arrangement of the 64 elements into 8 groups of 8 is an open question. One can also exploit this arrangement to the benefit of compressive sensing (add randomness to the signals). Hence, the author, X. Li, has decided to consider three choices. In Architecture 1 (See Figure 2.3), every 8 adjacent elements are grouped, which is basically 8 receivers operating in parallel with 8 channels each.

![Figure 2.3 Architecture 1 of grouping](image)

In Architecture 2 named as **Rnd grouping**, shown in Figure 2.4, the connections are chosen randomly and can change after each sampling.
In Architecture 3, shown in Figure 2.5, groups consist of an interleaved selection of elements and elements are connected sequentially.

X. Li’s approach to the compressive sensing can be described in short: the author introduces three ways of adding randomness to the signals and also suggests three architectures to connect the elements. He interfaces each scheme to every architecture separately (basically creates all of the possible combinations) and reports the results by demonstrating the images that are reconstructed with that method.

Later on, he adds another way of adding randomness to the signal, named random sub-sampling. In random-subsampling, the sampling period itself is fractioned into sub-sections and one of the samples is randomly selected in that time window and the rest are discarded. Furthermore, in a new series of simulations using architecture 2 (Figure 2.4), he changes the configuration of the elements on a sample-wise basis (Rnd grouping).

Performance of each design was compared to the raw data from 64 sensors without pre-processing for the image reconstruction. To summarize, random sub-sampling is not a fitting candidate; amplitude weight and random grouping sample by sample are the most promising choices.
For detailed results, comparison tables and figures see Chapter 4, Appendix A, and Appendix B from [15].

P. van der Meulen has performed more simulations in which he exactly replicates the test conditions of the planned prototype chip, and tries and suggests a scheme that proves to be working. In his simulations he reduces the number of receive elements to 32; the same number that is intended to be included in the ASIC prototype. These are the simulation conditions [16] [17]:

- Center frequency of excitation: $5 MHz$
- SNR of 40dB
- Sampling frequency of $25 MHz$; 4 times the transducer low pass cut-off frequency

And there are two experiments for reconstruction of: a) A cyst (See Figure 2.6) b) Eight point reflectors (the phantom image is not available)

Figure 2.6 Original Cyst to be reconstructed [13]

Figure 2.7 top left shows the reconstruction of the cyst phantom using 4 uniformly spaced channels. Figure 2.7 top center, top right and bottom figures all use the architecture 2 interface. Figure 2.7 top center shows random weight compression reconstruction, the top right figure shows random sample-wise grouping reconstruction, and the bottom figure shows random weights + sample-wise grouping of architecture 2.
As can be seen in Figure 2.7 the cyst phantom is visible, however the contrast is very low; hence, two pulse-echo measurements are used to reconstruct the cyst.

Figure 2.8 shows the results for the same configurations and schemes that are shown in the Figure 2.8. The difference is that 2 pulse-echo measurements are used. Figure 2.8 top left shows 8 uniformly spaced channels for image reconstruction.

In Figure 2.8, compressive methods evidently show a better reconstruction and display a high contrast with respect to multiplexing reconstruction method (all of the schemes acquire the same amount of data).

In order to make a comparison between the image reconstruction, a measure was used named Contrast-to-Noise Ratio (CNR); it is defined as the ratio of the mean absolute pixel values outside
the anechoic region to the absolute mean inside the anechoic region [17]. Table 2.1 summarizes the CNR values for different methods. Random weights in architecture 2 with two pulse-echo measurements shows the highest contrast.

<table>
<thead>
<tr>
<th>Method</th>
<th>4ch</th>
<th>8ch</th>
<th>16ch</th>
<th>Rnd weight &amp; grouping</th>
<th>Rnd weight</th>
<th>Rnd grouping</th>
</tr>
</thead>
<tbody>
<tr>
<td>CNR</td>
<td>1.13</td>
<td>1.31</td>
<td>2.14</td>
<td>1.66 (2.11)</td>
<td>1.64 (2.14)</td>
<td>1.44 (1.95)</td>
</tr>
</tbody>
</table>

1. This shows that the amount of data is equal to the raw data acquired from 4 channels (also applied to 8 & 16).

2. First number represents the CNR value for 1 pulse-echo measurement and the number in the parentheses shows the CNR value for 2 pulse-echo measurements (also the same for the other two compressive schemes).

Up to this point, only the reconstruction of a cyst phantom was discussed. However, the experiment included the reconstruction of 8 point reflectors as well. For the sake of brevity, we only bring the reconstruction results of 8 point reflectors with 2 pulse-echo measurements (See Figure 2.9). Figure 2.9 top left shows 8 uniformly spaced channels, the top center figure shows the random weight compression, top right figure shows random sample-wise grouping, and the bottom figure shows random weights + sample-wise grouping.

This report also investigates how filtering affects the signal reconstruction. It suggests that the filter in the output of the signal chain should have low-pass characteristics, and the cut-off frequency should be tuned at $f_{\text{sample}}/2$. 

![Figure 2.9 Point reflectors reconstruction – 2pulse echo measurements [17]](image)
2.4. Conclusion
Section 2.3 made clear comparisons, demonstrated effective reconstruction schemes, and made a feasible hardware design that can be translated into electronic building blocks. An imaging scheme that starts with 32 elements, adds randomness to received signals by multiplying a randomly generated amplitude weight from [-1,1] to the signal, and sums 8 randomly connected elements into 1 output; the gain weights are 8 discrete uniformly distributed levels from -1 to 1: \(\{\pm 1/7, \pm 3/7, \pm 5/7, \pm 1\}\). It also suggests that a filter with flat response can be used in the output to prevent performance loss; a filter that its cut-off frequency should be tuned at \(f_s/2\) (half the sampling frequency). A block diagram of the designed system can be found in Figure 2.10. In this configuration, 32 elements are divided into 4 groups randomly, and the signals are summed in low-pass summation node. The cut-off frequency of the filter is also denoted on the figure.

![Figure 2.10 Compressive system architecture for 32 elements](image)
3. System Design
3.1. PZT behavior

The minimum detectable signal in ultrasound imaging is limited by the noise level of a transducer. In order to find the dynamic range, each transducer has to be modelled in electrical terms. Piezo-electric transducers (PZTs) have been chosen for this thesis. According to the “Butterworth-Van Dyke” model [29] (see Figure 3.1) a PZT transducer model consists of a capacitor in parallel with an RLC branch. The transducer has two near resonance frequencies.

![Butterworth-Van Dyke PZT model](image)

Passive element values for a transducer of a size 150μm×150μm are provided by C. Chen et. al in [30]. We have derived the values for a 1500μm×150μm by scaling the values (see Table 3.1).

<table>
<thead>
<tr>
<th>Element</th>
<th>$C_p$</th>
<th>$C_s$</th>
<th>$R_s$</th>
<th>$L_s$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Value</td>
<td>9.30 [pF]</td>
<td>10.3 [pF]</td>
<td>0.2184 [kΩ]</td>
<td>99.2 [μH]</td>
</tr>
</tbody>
</table>

The PZT transducer and its resonance frequency model can be seen in Figure 3.2.

The resonance frequency can be written as [31]:

$$f_{\text{res}} = \frac{1}{2\pi \sqrt{L_s C_s}} = 4.97 \text{ MHz}$$

The simulated magnitude of the impedance of a transducer is shown in Figure 3.3. Two resonance frequencies can also be seen from the plot.
The transducer model at resonance is also shown in Figure 3.2. The Noise Power Spectral Density (PSD) is given by $v_n^2 = 4kT R_S \frac{v^2}{Hz}$. The total integrated noise over 50% of bandwidth ($f_{center} = 5MHz$) for each PZT transducer in the output node obtained from simulation is:

$$\bar{v}_{n, rms}|_{3.75M-6.25M} = 4.66 \mu V rms$$

(2)

## 3.2. System Specifications

In the previous section (3.1), the noise power of a single transducer has been calculated. The first block of each channel should create enough gain to suppress the noise of the rest of the chain and, on an agreed condition, should have a noise power less than or at most equal to each transducer. According to the data provided by FDA self-heating of a probe should not result in increase of tissue temperature [32]; hence, power consumption of each channel should roughly be less than 1mW [11].

Each transducer element transforms mechanical pressure into an electric signal. Based on the sound wave reflections and the distance to the object these reflections could be strong or weak. In this work, we assume a 65 dB of dynamic range for each transducer element. Since, the noise of each transducer can be modelled by the thermal noise of a resistor in Butterworth-Van Dyke model, one can obtain the highest signal value that can be acquired from the PZT elements.

Assuming a 65 dB of dynamic range in the input of the circuit, and also knowing the noise level of each PZT element (see section 3.1) the maximum signal at the input can be calculated as:

$$V_{sig, in, max} = 4.66 \mu V rms \times 10^{65/20} = 8.29 \text{ mVrms or } 11.7 \text{ mVp}$$

(3)

The dynamic range of the ADC is however different. There are 8 different signal paths that are summed into one output. The signal amplitudes are directly summed while the noise amplitudes cannot be summed as they are not correlated. This means that the noise powers should be calculated. There comes a factor of $\sqrt{8}$ in the dynamic range.

$$DR_{ADC} = 65 dB + 20 \times \log_{10} \sqrt{8} = 74dB$$

(4)

The range can be covered with a 12-bit resolution ADC ($12*6.02 + 1.76 = 74dB$) [33] or an ADC with less number of bits with the help of a variable gain amplifier can solve the problem. The variable gain
must be at least 14 [dB] (74-60=14). This, of course, results in a tradeoff that involves optimizing power consumption per channel.

On the other hand, from an imaging point of view, ultrasound waves attenuate as they spread inside a tissue [34] and it is inversely related to the distance from transmission point and the signal frequency [35].

\[ \text{Attenuation coefficient} = \alpha \left[ \frac{\text{dB}}{\text{MHz.cm}} \right] \] (5)

Average attenuation value for a tissue is around 0.54 dB/MHz.cm [35].

So, a Time-Gain Compensation (TGC) stage is required to amplify weak echoes and also it benefits the design by using an ADC with lower resolution. The 14 [dB] variable gain means a ratio of 5 that can be implemented in the circuit.

A typical ultrasound receiver, samples the signal at a rate higher than Nyquist and is typically around 3 to 4 times the highest signal frequency. At a center frequency of 5MHz and a 50% bandwidth, the sampling rate becomes

\[ f_{\text{sample}} = 4 \times 6.25 \text{MHz} = 25 \text{ [MHz]} \] (6)

Therefore, a 10-bit resolution ADC that that works at 25MHz is considered for this design based on [12]. The design is implemented in CMOS 0.18\textmu m LV technology, and the headroom is 1.8 \text[V]. An input swing of 1.2 [Vpp] is a reasonable choice.

Now, the gain of the whole signal chain can be decided. Considering a 12mVp amplitude in the input of the signal chain the overall gain can be written as:

\[ A_{\text{total}} = \frac{1.2V_{pp}}{24mV_{pp}} = 50 \text{ [v/v]} \] (07)

In Table 3.2 you can see the entire system requirements.

<table>
<thead>
<tr>
<th>Requirement</th>
<th>Target value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bandwidth</td>
<td>3.75M-6.25M(2.5MHz)</td>
</tr>
<tr>
<td>Dynamic Range</td>
<td>65 dB</td>
</tr>
<tr>
<td>Power limit per channel</td>
<td>1mW</td>
</tr>
<tr>
<td>TGC Gain Step</td>
<td>14 dB</td>
</tr>
<tr>
<td>LNA noise$^1$ rms</td>
<td>4.66\textmu V</td>
</tr>
<tr>
<td>ADC resolution</td>
<td>10-bit</td>
</tr>
<tr>
<td>Sampling rate</td>
<td>25MHz</td>
</tr>
<tr>
<td>Overall gain of signal chain</td>
<td>50</td>
</tr>
</tbody>
</table>

$^1$ LNA Noise over the 3-dB bandwidth of the transducer
3.1. Architecture design
The Analog Front End (AFE) includes amplification, compression (by randomly weighting the signals), summation and, last but not least, digitization (See Figure 3.4).

There are 4 groups of 8 elements, which are fixed by the scheme suggested in chapter 2. After amplification adding randomness through one of the chosen methods (to be discussed in the rest of this chapter), weighted signals are summed and digitized and finally fed to the output.

The Low Noise amplifiers (LNA) should have a low noise over the bandwidth while also having a moderate gain in order to suppress the noise of the rest of the signal chain. Furthermore, Each pulse-echo measurement includes a transmit mode and a receive mode. As the signal travels into the tissue and reflects back to the receiver, it will lose some of its power (also see section 3.2.); hence, the signal chain needs to have a Time-Gain Compensation (TGC) function in order to compensate for the smaller echoes.

Implementing compression scheme requires to create randomness in the signal, and finally the randomized signals should be summed into a unit output.

As can be derived from section 2.4, 8 discrete gain levels that are equally distributed from -1 to 1 are required. Consequently, the gain levels are $\pm 1/7$, $\pm 3/7$, $\pm 5/7$, $\pm 1$. Theoretically, generating these signal ratios and summing them can be carried out in three domains; charge domain, voltage domain, and current domain.

In order to come up with a practical solution for the summation scheme, the floor plan of the chip and the parasitic effects it introduces has been taken into account. It consists a linear array of 32 elements, each element having 150µm pitch (See Figure 3.5).
Long routing traces across the chip will form parasitic capacitances that cannot be neglected. Apart from cross-coupling between metal traces, the parasitic capacitance of a 1µm wide and 5mm long *Metal1* trace to the substrate equals to ~1.12pF. With that taken into account, different architecture choices can be evaluated.

### 3.1.1. Charge summation

In order to obtain 90% accuracy in the charge domain, the summation charge should be accumulated on a capacitor 10 times larger than parasitic capacitance, i.e. 11-12pF per group. Total capacitance of summation nodes are 4 times the value of each individual group about 45-50pF. Needless to say, this architecture requires a considerable amount of area. On the other hand, ADCs are run at 25 MHz which means, capacitors should be charged within a 40ns time frame. Charging a 50pF capacitance with a 1V driver in 40ns requires:

\[
I = \frac{C \cdot V}{t} = \frac{50\text{pF} \cdot 1\text{V}}{40\text{ns}} = 1.25 \text{[mA]}
\]  

And this only accounts for the large signal behavior. Regarding the small signal behavior, the signal on this node should settle within 40ns which requires a very small resistance at that node. It can be derived for 5σ accuracy, one can write:

\[
5RC < 40\text{ns} \quad \& \quad C \approx 12\text{pF} \quad \rightarrow \quad R < 660\Omega
\]  

This resistance value requires a very large bias current and shows that this choice is not a very promising design.

### 3.1.2. Voltage summation

The voltage cannot be summed naturally. It requires an additional amplifier with adequate feedback connections to do the summation. Aside from the LNA, a buffer is required to drive the big capacitance that sits in the summation node. Then these voltages should be summed using an amplifier. Also, there stray impedances across the chip that make voltage dividers and the voltage that is connected to the summation node could vary based on the length of the path. Voltage
summation gives a rather complex circuit configuration with multiple connections for the summing amplifier. However, it still remains as a possible and valid solution to do the summation.

**3.1.3. Current summation**

The current summation, however, takes place naturally using the Kirchoff's current law. If all the currents are tied to one node, the output current is the sum of the entering current. The only requirement that is imposed on a current summation scheme is that a virtual ground should be created which has a lower input impedance than the parasitic capacitor so that it absorbs most of the entering current. For a capacitor as large as 1.12pF at 5MHz which has an input impedance of 28.4kΩ a 10 times smaller impedance means creating a 2.84 kΩ impedance at the summation node (in order to achieve 90% accuracy). On the other hand, the output current does not attenuate or change by travelling across the chip while a voltage will be attenuated affected by the voltage divisions occurring in the transmission line.

**3.2. Designed system**

To conclude, signals of 8 randomly chosen elements should be amplified to a certain degree, transformed into current, summed up and the output voltage should be digitized. Adding random weights happen in the current domain, hence the RND multiplier (shown in Figure 3.4) is inherently included in the trans-conductance amplifiers (See Figure 3.6).

![Figure 3.6 Signal chain architecture for a group of 8](image)

The bottom plate of the transducer elements are all grounded. So, the LNA has a single-ended input of PZT top plate and a single-ended output. However, from the trans-conductance amplifier onwards, signals are generated differentially to suppress common-mode interferences.

Gain of the LNA should be fixed as the linearity of the next stage, trans-conductance amplifier, depends on the signal swing in its input. Hence, including gain programmability in LNA will only make the design more complex. The TGC function can also be implemented in the TIA with two sets of feedback networks. These networks are actually a resistor and a capacitor in parallel around the TIA (see section 2.3.) that create a low pass filter characteristics. Adding two sets of feedback
network is also possible; however, adding only two resistors for the TGC implementation in the trans-conductance amplifier has been chosen in this thesis work.

### 3.2.1. LNA Architecture

Thus far, it has been clarified that LNA must have a well-defined gain; its noise-level should be equal to or less than that of a transducer, and its bandwidth should be 2.5MHz (3.75M-6.25M). An OTA with capacitive feedback will create well-defined and fixed gain (over a large span of frequency) and also the feedback network does not add additional noise to the circuit [36].

![LNA block diagram](image)

Figure 3.7 LNA block diagram

Assuming that the impedance of the transducer is much smaller than \(C_{\text{in}}\) in the receive band, Figure 3.7 shows the block diagram of a capacitive feedback LNA. Voltage transfer function can be written as

\[
V_{\text{out}} = \frac{V_{\text{in}}}{Z_{\text{in}}(s)} \ast Z_{\text{fb}}(s) = \frac{C_{\text{in}}}{C_{\text{fb}}} \ast V_{\text{in}}
\]  

(10)

In order to maximize the power efficiency, a single-ended inverted-based amplifier is used [36]. The CMOS inverter-based LNA with feedback network has been illustrated in Figure 3.8.

![Inverter-based LNA](image)

Figure 3.8 Inverter-based LNA

This topology shows a poor PSRR (Power Supply Rejection Ratio) compared to a differential circuit. Since the design is a single-ended inverter, the supply of the LNA circuits along the chip has been isolated from the rest of the circuitry.

Using this configuration, the \(g_{\text{in}}\) of two transistors are summed. Noise transfer function can be written as (see also Figure 3.9):

\[
\overline{V_{\text{out},n}} = \overline{V_{\text{n}}} \ast (1 + \frac{C_{\text{in}}}{C_{\text{fb}}})
\]  

(11)
And the input referred noise, is the same value divided by the transfer function of the circuit:

$$\overline{V_{in,n}} = \frac{\overline{V_{out,n}}}{A_{CL}} = \left(1 + \frac{C_{fb}}{C_{in}}\right) * \overline{V_n} \quad (12)$$

For a nominal gain of 10, $1.1V_{n,\text{rms}}$ is referred to the input which means the noise of the input transistors appears at the input of the circuit.

Parasitic capacitances at the input of OTA affect the noise transfer function (NTF). Figure 3.10 represents the circuit model in the presence of input capacitance.

The input-referred noise of the circuit can be written as

$$\overline{V_{in,n}} = \left(1 + \frac{C_p + C_{fb}}{C_{in}}\right) * \overline{V_n} \quad (13)$$

By re-writing the equation (12), it can be seen that

$$\overline{V_{in,n}} = \left(1 + \frac{1 + \frac{C_p}{C_{fb}}}{\frac{C_{in}}{C_{fb}}}\right) * \overline{V_n} \quad (14)$$

At a fixed gain value ($\frac{C_{in}}{C_{fb}} = \text{const.}$) increasing both $C_{in}$ and $C_{fb}$ helps in suppressing the effect of parasitic capacitance in NTF, but it also reduces the bandwidth of the circuit by increasing the output capacitance of the circuit. Also, increasing $C_{in}$ and $C_{fb}$ results in a larger LNA. Therefore, noise trade-off should be optimized between power consumption and area consumption.

### 3.2.2. Trans-conductance amplifier Architecture

Trans-conductance amplifier has probably the most complex architecture in the whole signal path. It should realize the TGC function as well as random weighting of the signals. Digital circuitry to control the compression scheme is also part of the trans-conductance amplifier, but it will be
covered in the section 3.3. Compressive scheme implementation. Trans-conductance amplifier has to transform the voltage signal into current. The signal swing at its input varies the from minimum detectable signal (~4.66μVrms or 6.59μVpeak) to the maximum amplitude of (~16mVpeak) that is amplified by LNA gain (~132μVpp-320mVpp). The trans-conductance amplifier has to maintain linear within the range; otherwise, changing \( g_m \) value with signal amplitude will lead to signal dependent tones (harmonics) that limit the dynamic range and decrease the SNR.

Several structures have been proposed in order to maintain \( g_m \) linear in [37] and a comprehensive comparison between the most conventional trans-conductor structures can be found in [38]. The architecture that is chosen for this stage is similar to the one described in [39]. The simplest way to transform a voltage signal into current is to use a resistor if the voltage is kept constant.

\[
I_R = \frac{V_{in}}{R} \tag{0.15}
\]

A voltage follower (VF) can be used in order to buffer the voltage and reduce the output resistance (see Figure 3.11). The current source has a finite output impedance of \( R_{source} \).

![Figure 3.11 A common-source(VF)](image)

Trans-conductance of the circuit in Figure 3.11 can be written as [39]

\[
G_m = \frac{g_m}{1 + g_mR_{source}} \tag{016}
\]

As the \( g_m \) of an NMOS transistor varies with different process and temperature conditions, in order to create a well-defined \( G_m \), its value must only depend on \( R_{source} \). And that implies that \( R_{source} \gg 1/g_m \) and \( G_m \) can be written as

\[
G_m = \frac{1}{R_{source}} \tag{017}
\]

One can increase \( R_{source} \) or \( g_m \) values. Increasing the \( g_m \) using circuit techniques is an optimal choice. By adding a feedback loop around the input transistor, \( g_m \) will be multiplied by the open loop gain of the auxiliary amplifier (see Figure 3.12) [37].
Using CASFVF structure [38] the output conductance of the circuit can be written as (also see Figure 3.13)

\[ g = A \cdot g_m = g_m (g_m r_{out})^2 \]  \hspace{1cm} (018)

From this stage, signal chain turns into differential domain. A degeneration resistance is used to define the \( g_m \), and Kelvin switches has been added to the design to prevent any signal loss [39] (See Figure 3.14). The dc currents are passing through S1 and S2 switches, while the signal appears on the two terminals of the resistor through S3 and S4 without any voltage drop.
Using equivalent differential half-circuit model, differential trans-conductance can be written as

$$G_{m,\text{diff}} = \frac{2}{R_s}$$  \hspace{1cm} (19)

Now, the current signal passes through $M_3$ and $M_6$. The only remaining task is to copy the signal out. By adding two diode-connected NMOS transistors on top of $M_3$ and $M_6$ (in Figure 3.14) current signal can be copied out.

Figure 3.15 shows the whole trans-conductance amplifier which includes differential CASFVF circuit, a copy cell, Kelvin connections, and different gain settings ($R_{s1}$, and $R_{s2}$) for TGC purpose. Random weighting of the signals is carried out in this block also. Since the signal has been transferred into
current, different weights means a multiplicity factor in the current. Random weights include ±1/7, ±3/7, ±5/7, and ±1. Actual current values is irrelevant as long as the output current proportions are persistent. Signal signs can be generated by using a chopper switch in the output of copy cells. The last step is to scale the current by making 7 unit cells of (i₀) where i₀ represents the absolute value of minimum output current (See Figure 3.16).

![Figure 3.16 Copy cells and chopper switches](image)

By properly activating the chopper switches, all gain weights and signs can be generated. Also, the copy cell outputs are tied to two nodes; where according to Kirchhoff’s current law, output current is the sum of input currents.

### 3.2.3. TIA Architecture

The TIA’s role is to sum the currents of 8 elements and amplify the signal to the span of ADC input range. As can be deduced from Figure 3.5 and Figure 3.6, large capacitances (routings to the inputs of TIA) are sitting in the input nodes of the TIA. In the design of TIA, three main constraints are imposed on the design; i.e. noise, bandwidth, and input impedance in current division.

First, the noise is mainly determined by the thermal noise of the TIA’s input pair.

\[
\overline{v_{n,m}}^2 = 4kT \cdot \gamma g_{m,TIA},
\]

\[
\overline{v_{n,out}}^2 = \overline{v_{n,m}}^2 \cdot \left(g_{m,TIA}r_{out,TIA}^2\right)
\]

By dividing it to the feedback network impedance, trans-conductance amplifier transfer function, and the LNA gain an equivalent noise voltage referred to the input of the chain can be obtained. The noise power of each TIA is spread over the 8 channels connected to it; hence, the noise performance of a single TIA should be sufficiently below the noise power of 8 single channels combined. It can be seen that the noise is not stringent constraint in the design of TIA.

Second, the time constant at the input shown by the \( \tau = RC \) where \( R \) denotes the resistance in the input node of TIA and \( C \) is the parasitic capacitance at the same node. Voltage on the input node should be settled within less than a sampling interval \( f_{\text{sample}} = 25MHz \). In order to achieve %99.99 accuracy

\[5\tau = 5RC < 40\text{ns} \]
This gives the second constraint in designing the amplifier.

The last constraint is the input impedance of the TIA which should be at least 10 times smaller than the parasitic impedance.

\[
Z_{in} = \frac{v_{in}}{i_{in}} = \frac{v_{in}}{g_m v_{in}} = \frac{1}{g_m}
\]

\[\text{(023)}\]

TIA configuration in the presence of parasitic impedances has been illustrated in Figure 3.18.

For the worst case, parasitic impedances at 25MHz have been calculated.

\[
\frac{1}{g_m} \ll \frac{1}{C_p s} \quad @25MHz
\]

\[\text{(0.24)}\]

In overall, a large trans-conductance value (that is obtained from (20), (022), and (23) ) is required in the design of this stage. Output of the TIA is directly connected to the ADC sampling capacitors, and there is no driver stage in the design. Hence, the amplifier should be capable of driving two \( (C_{load} = 2pF) \) caps. There are two design parameters that are important: 1) large trans-conductance 2) large GBW product. Among the available design choices, a cascode stage (both telescopic and folded) and an inverter-based amplifier are both good choices for a large GBW product (See Figure 3.19). Both structures can create sufficient open-loop gain.
Figure 3.19 Telescopic cascode and inverter based configurations

However with the same bias current, an inverter-based amplifier has a total trans-conductance of about twice the value for its corresponding circuit choice (cascode amplifier). Thus, an inverter amplifier seems to be the favoured design choice.

The feedback network is determined by the compressive scheme simulations. According to [17], frequency response of the amplifier must be equivalent to a low-pass filter; all the data in the frequency spectrum must be amplified equally, and the cut-off frequency of the circuit must be equal to half the sampling rate \(12.5MHz, f_{\text{sample}} = 25MHz\) in order to prevent aliasing effect. \(R\) and \(C\) values of the feedback network are determined by the overall gain of the signal chain and cut-off frequency.

Figure 3.20 shows the amplifier, feedback network and loads.

Figure 3.20 TIA feedback network
3.2.4. ADC Architecture

This work requires an ADC for a sampling frequency of 25MHz. For moderate frequencies such as 25MHz, SAR ADC is one of the logical choices as it is power efficient. As has been decided earlier in section 3.2, a 10-bit ADC is chosen for this work. Hence, the 10-bit Charge-Sharing SAR ADC (CSSAR) from [12] has been used in this project. Typical SAR ADCs use power-hungry reference buffers in order to achieve settling accuracy and speed. On the other hand, a CSSAR ADC creates reference voltages by pre-charging the CDAC during the sampling phase which relaxes the power constraints. Furthermore, this ADC uses a capacitor bank that consists of only 67 unit caps instead of 1024 which reduces the area considerably [40]. So, the main power is being spent in the dynamic comparator of the ADC. This ADC enables a power efficient sampling of the signal while also occupying a rather modest area.

The original ADC works at the 60MHz clock rate, and it has been optimized for the input range of 800mVpp; however it can accommodate up to 1.2Vpp. This thesis uses the maximum range of 1.2Vpp. Also, the ADC is being clocked at 25MHz.

3.3. Compressive scheme implementation

According to [41] [42] [16] [17], and as it was discussed in chapter 2, the compression scheme requires circuitry that creates randomness. There are several ways to generate a pseudo-random sequences using digital circuitry. However, one has to verify the generated sequence and employed algorithm is random enough, so that the compression works.

One way is to make use of Linear-Feedback Shift Registers (LFSRs) [43]. As can be concluded from the name itself, the circuit consists of a few Registers, and also a few logic gates that are connected through a feedback configuration. In general, a shift register with the length \( N \) (number of registers), can generate a maximal sequence length of \( m = 2^N - 1 \), but the actual length depends on feedback taps and initial state of the shift register [44]. There are several tables [45] that summarize the polynomial, length of sequence, and the tap numbers.

3.3.1. LFSR Design

The compression scheme consists of 8 gain levels, which can be encoded into 3bits. Creating a random sequence requires a greater length than 3 bits. Although more bits always result in more randomness, there is a trade-off between design performance and its complexity. LFSRs with 11 bits or less (except for 8-bit) require only 2 taps in the feedback, which makes the design simpler and power efficient. A 10-bit LFSR has been used to generate 3 bits in each sampling cycle (B0, B1, B2 which can be seen in Figure 3.21). In order to generate these 3 bits, an arbitrary combination of 6 bits out of 10 has been investigated. The designed system is shown in Figure 3.21. As can be seen 3 pairs of bit have been XORed separately.
A histogram of the generated numbers (3 bits) can be a good measure to evaluate the randomness of the generated sequence. This histogram has been plotted in Figure 3.22 where the y-axis denotes the repetition of each number (from 0 to 7). Sum of all the bars is the total length of the random sequence, i.e. $2^{10} - 1 = 1023$; that one sequence short of 1024 is the all zeros seed which stops the sequence generation and the loop stays at ten 0’s in each cycle.

Another way to interpret the data is through scatter plot. Figure 3.23 shows the scatter plot of the generated numbers; there is no lumped data in the range. Although one can argue that this method is an intuitive approach for evaluating randomness; hence this method was put into simulations.
This method was used as the pseudo-random sequence generator in the MATLAB simulations of Pim v. Q. Meulen in [17] and it showed no meaningful deviation from the original results.
4. Circuit implementation and simulation results
In the previous chapter, the design and requirements of separate blocks and also the system have been presented. This chapter verifies the design by presenting extensive simulation results. First, the simulation results of each stage are presented in the pre and post-layout modes. Then, the pre-layout simulation, and post-layout simulation of a group of 8 elements will be presented. This is verified by carrying out a MATLAB simulation of the programmed circuit and by matching the outputs to the output codes (bitstream). It should be noted that the whole chip consists of 4 parallel-working groups of 8 elements and only one group’s output is shown in the end. The rest are only replicas of the same group.

4.1. LNA

4.1.1. Circuit implementation

Figure 4.1 shows the main branch of the LNA. The cascode transistors and input PMOS are connected to bias voltages. However, the input NMOS transistor’s dc operating point is set through a DC Servo loop that is shown in Figure 4.2 [11].

Choosing the size of M1 and M4 is a tradeoff. They both have large sizes in order to suppress their flicker noise and also these should not be too large to increase the noise in the input (by increasing the noise transfer function from \( M_{1,n} \) to \( V_{in} \)). However, the cascade transistors (M2 and M3) are small in order to reduce the parasitic capacitance introduced by them. A summary of the transistor sizes is given in Table 4.1.
Table 4.1 LNA transistor sizes

<table>
<thead>
<tr>
<th>Transistor</th>
<th>W/L</th>
<th>No of fingers</th>
</tr>
</thead>
<tbody>
<tr>
<td>M1</td>
<td>16.0um/500nm</td>
<td>6</td>
</tr>
<tr>
<td>M2</td>
<td>2.0um/180nm</td>
<td>20</td>
</tr>
<tr>
<td>M3</td>
<td>2.0um/180nm</td>
<td>15</td>
</tr>
<tr>
<td>M4</td>
<td>11.0um/180nm</td>
<td>4</td>
</tr>
</tbody>
</table>

Figure 4.2 shows the DC Servo loop of the LNA circuit. It is a single-ended gm-stage that delivers current to the gate of the input transistor. A large capacitor has been employed in the output of this stage in order to suppress the bandwidth of this stage (for noise reduction and stability purposes).

![Figure 4.2 DC Servo Loop for LNA](image)

4.1.2. Layout

Figure 4.3 shows the layout of this stage. The input node can be seen on the left side of the layout, while the output of the LNA is shown in the upper side.
4.1.3. Simulations results

The LNA’s configuration has been proposed in chapter 3. A moderate gain value of 10 (20dB) has been considered for this stage. Also the noise requirement is given in chapter 3 (total rms noise = 4.66µV). Also the LNA should have a flat frequency response in the signal band (3.75M-6.25M). Here we seek to compare the results to the design specifications.

Power Consumption: In the pre-layout simulations, LNA consumes 192 µA in total. The main branch burns about 153 µA and the rest is spent in creating bias voltages and dc servo loop. However, in the post-layout simulations, the LNA burns about 195 µA.

Input referred Noise: Input referred noise of the LNA, in the pre and post-layout simulations is given in Table 4.2. As can be seen in section 3.2 noise value is in line with the desired level (4.66 µV).

Table 4.2 Noise performance of the LNA (rms values)

<table>
<thead>
<tr>
<th>Noise Value</th>
<th>Pre-layout sim</th>
<th>Post-layout sim</th>
</tr>
</thead>
<tbody>
<tr>
<td>4.65µV</td>
<td>4.87µV</td>
<td></td>
</tr>
</tbody>
</table>
Figure 4.4 Frequency response of LNA (pre-layout simulation)

Figure 4.4 shows the frequency response of the LNA in the pre-layout simulations. The gain in the middle frequencies and also the bandwidth can be seen in the plot. This stage was designed to have a fixed gain value of 10 and the plot shows 9.77 (19.8 dB). And also it is required that the frequency response is flat in the receive band from 3.75M to 6.25M (5MHz, 50% bandwidth) which is also satisfied in the plot.

Figure 4.5 shows the post-layout simulation of this circuit. A slight degradation in the gain can be spotted in the gain magnitude.
Figure 4.6 shows a step response of the LNA (in the pre-layout simulation). A Step of 50mV on the dc value of 1.2V is given to the LNA. The input is also shown in this figure. The rise time of the input signal is 100ns. The output can be seen also (the red trace). Figure 4.7 shows the same test for the post-layout simulation.

There is no significant difference that can be spotted between the two figures.
Table 4.3 shows the THD value of the output signal of LNA for the input signal of 32mVpp and the output swing of 300mVpp (the gain is slightly smaller than 10 in the post-layout simulations). THD value is below 1% which means that it gives at least 40dB of dynamic range.

<table>
<thead>
<tr>
<th>THD Post-layout</th>
</tr>
</thead>
<tbody>
<tr>
<td>Value %0.132</td>
</tr>
</tbody>
</table>

4.2. Trans-conductance amplifier

4.2.1. Circuit implementation

The trans-conductance amplifier is connected to the output of the LNA. Since the output of the LNA is single ended, a bias voltage that generates the dc value of the LNA output should be connected to the other input of the trans-conductance amplifier. Figure 4.8 shows the schematic of the trans-conductance stage [39].

As has been said above, \( V_{in-} \) is connected to the output of the LNA; \( V_{in+} \) is connected to a circuit similar to Figure 4.1 with the input voltage grounded. This bias voltage is shared among a group of 8 trans-conductance amplifiers.

4.2.2. Layout

Figure 4.9 shows the trans-conductance amplifier layout. On the right side, there are two divisions. Right division shows all the switches, shift registers, MUXs, and the rest of the digital gates. It is shielded with one p-sub from the rest of the circuit. The other division shows the output switches and
copy cells that are put adjacent to one another. The bulk of the layout, on the upper left, are the main branch transistors which are put and connected as symmetric as possible.

Figure 4.9 Trans-conductance stage layout

Figure 4.10 shows an LNA that is connected to the trans-conductance stage. The rest of the empty space is occupied with metal fillings and de-coupling caps. In one of the trans-conductance amplifiers, the void that exists between the LNA the right side columns is filled with a dummy LNA that creates bias voltages for a group of 8 elements (shared among 8 trans-conductors). You can see it later in chapter 5.

Figure 4.10 LNA connected to a trans-conductance
4.2.3. Simulation results

Power Consumption: In the pre-layout simulations, the trans-conductance stage consumes 299 $\mu$A in total. Each branch of the differential stage burns 100 $\mu$A. Also each copying cell (see chapter 3) burns about 7 $\mu$A. The rest is spent in the biasing circuit and the output CMFB. The post-layout simulations, show that Trans-conductance stage burns 292 $\mu$A.

4.2.3.1. First gain setting (5k$\Omega$ resistor)

The input waveform of the trans-conductance circuit for the first gain setting is a sinusoidal with the amplitude 160mV$_p$. This is the maximum input signal value. For the second gain setting, the input will shrink to maintain the linearity of the circuit.

![Figure 4.11 Trans-conductance differential output current(pre-lay out) for 5k$\Omega$](image)

The effective trans-conductance of the block can be calculated as:

\[
G_{m, total} = \frac{i_{out, pp}}{v_{in, pp}} \approx \frac{48.4 \mu A}{320 mV} = 0.15 mS
\]

\[
\frac{1}{G_m} = 6.6 k\Omega
\]

During the current copying, the copy cells copy only half of the current which means, the trans-conductance is divided by two, or the resistance is multiplied by two:

\[
\frac{1}{2 \cdot G_m} = 3.3 k\Omega
\]
In the real circuit, we have $R_s = 2.5k\Omega$ which is somewhat different from the obtained circuit value. This is due to the fact that all the small signal current does not go through the resistor and the amplifier absorbs a portion of it which results in a higher equivalent resistance value.

Figure 4.11 shows the differential trans-conductance stage output current.

![Differential Transconductance Stage Output Current](image)

**Figure 4.12** Trans-conductance differential output current (post-layout) for 5kΩ

Figure 4.12 shows the output differential current. Switching spikes is shown on the waveform. With all the isolations in the layout, these spikes could not be prevented, however the timing of the sampling was tuned so that the spike does not appear in the samples. Other than that, the waveforms look symmetric, and the THD value (shown in Table 4.4) is about 2.9% which is higher than 1% that it is targeted for. For lower amplitudes it THD improves.

**Table 4.4** THD values for pre and post-layout simulations (5kΩ)

<table>
<thead>
<tr>
<th>THD Value</th>
<th>Pre-layout sim</th>
<th>Post-layout sim</th>
</tr>
</thead>
<tbody>
<tr>
<td>Value</td>
<td>0.659%</td>
<td>2.89%</td>
</tr>
</tbody>
</table>

### 4.2.3.2. Second gain setting (1kΩ resistor)

Now, the same simulations will be carried out for the other gain setting 1kΩ resistor in the circuit. The input signal will be reduced so that the output will not be saturated. Now the input is a 16mV$_p$ sinusoidal applied to the input of this stage.

Figure 4.13 shows the output current for the applied input signal.
Figure 4.13 Trans-conductance differential output current (pre-lay out) for 1KΩ
The effective trans-conductance of the block can be calculated as:

\[ G_{m,\text{total}} = \frac{i_{out,pp}}{v_{in,pp}} \approx \frac{23\,\mu A}{32\,mV} = 0.72\,mS \]

In a same way that has been calculated for the previous gain settings, it can be written:

\[ \frac{1}{G_m} = 1.39\,k\Omega \]

And also we can write:

\[ \frac{1}{2 \times G_m} = 0.7\,k\Omega \]

Whereas in the main circuit \( \frac{R_s}{2} = 0.5k\Omega \). Same explanation as in the previous section can be expressed here. Figure 4.14 shows the same plot for post-layout simulations. Again the spikes can be spotted in the figure. The trans-conductance value obtained from the post-layout simulations is shown below which is slightly lower than the pre-layout simulations:

\[ G_{m,\text{total}} = \frac{i_{out,pp}}{v_{in,pp}} \approx \frac{22\,\mu A}{32\,mV} = 0.69\,mS \]

Figure 4.14 Trans-conductance differential output current(post-lay out) for 1KΩ
4.3. Trans-Impedance Amplifier (TIA)

4.3.1. Circuit implementation

As can be seen in Figure 4.15, the Trans-Impedance Amplifier (TIA) and its CMFB circuit are shown and on the right side of the figure, the feedback network with R and C in parallel are shown. This stage drives the sampling capacitors of the SAR ADC which are 2pF on each output node.

4.3.2. Layout

Figure 4.16 shows a picture of the TIA lay out. The 2 feedback capacitors sit on the left side of the TIA, beneath the input transistors. The rest of the circuit including input transistors, current source, cmfb circuit and the biasing circuit can also be seen.
4.3.3. Simulation results

**Power Consumption**: In the pre-lay out simulations, TIA consumes 426\(\mu\)A current. This amount is shared among 8 channels, which makes it 53 \(\mu\)A current per channel. The post-lay out simulations show 425\(\mu\)A value.

**Noise**: Table 4.5 shows a summary of the noise values of the TIA. The noise values are reported over the bandwidth of 3.75M to 12.5M (cut-off frequency of the TIA). In the TIA design, three criteria were applied of which the noise was dominated by the bandwidth; hence, these noise of the TIA does not play a role in the noise performance of the signal chain and the final SNR.

Table 4.5 TIA noise report

<table>
<thead>
<tr>
<th></th>
<th>Input referred (current noise)</th>
<th>Output referred (voltage noise)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Pre-lay out</td>
<td>11.44 nA</td>
<td>30.74 (\mu)V</td>
</tr>
<tr>
<td>Post-lay out</td>
<td>11.52 nA</td>
<td>30.85 (\mu)V</td>
</tr>
</tbody>
</table>

**AC Response**: The TIA is supposed to have a low-pass filter characteristics, with the cut-off frequency tuned at 12.5MHz (half the sampling frequency). Figure 4.17 shows the pre-lay out simulation results with cut-off frequency shown in the figure which is line with the desired value.
Figure 4.17 TIA AC response (Pre-layout)

Figure 4.18 shows the post-layout results which is very similar to the results from Figure 4.16 and shows a good performance.

**Figure 4.18 TIA AC response (Post-layout)**

**Transient:** Figure 4.19 shows the differential output voltage of the TIA for the maximum sum of 8 input currents (192µA peak) in the pre-layout simulation.
Figure 4.19 TIA transient response (pre-lay out sim)

Figure 4.20 shows the same results for post-lay out simulation which is also very similar to the pre-lay out simulations.

Figure 4.20 TIA transient response (post-lay out sim)
Also, the THD criteria is that it should be typically less than 1% in order to achieve 40 dB dynamic range.

Table 4.6 shows the THD values of the output voltage of the TIA. It does not specify any significant change. Also, the THD criteria is that it should be typically less than 1% in order to achieve 40 dB dynamic range.

<table>
<thead>
<tr>
<th>THD</th>
<th>Pre-lay out sim</th>
<th>Post-lay out sim</th>
</tr>
</thead>
<tbody>
<tr>
<td>Value</td>
<td>0.073%</td>
<td>0.080%</td>
</tr>
</tbody>
</table>

4.4. System simulations

Now that the function of each block was put into test separately, it is time to simulate and characterize the performance of the whole signal chain. As a reference, the system has been simulated in MATLAB. The pre-layout results and post-layout results will be presented.

The compression scheme was designed to weight the currents and sum them into one output. For a specified seed, one can calculate the output of the summation. Figure 4.21 shows the result of the desired simulation using MATLAB. Although the input signal and also the output current are both sinusoidal, only the final value in each cycle has been presented in this plot. In the actual circuit, there are current glitches and transients in the early parts of the cycle. Based on that, the ADC is timed to sample the result at the end of the cycle. Figure 4.22 shows the transient input current of the TIA. The first 80ns time window is used to program the seed into LFSRs and is a dead zone time window. The reference that is made in Figure 4.21 is correspondent to 80ns in Cadence transient simulations.
In order to make a more clear comparison between the two plots, the differential input current of the TIA has also been transferred and plotted in MATLAB. Figure 4.23 shows a comparison between the two results. This plot gives more information on the gain errors without showing the transient behavior of the signal.

This current will be multiplied into the feedback impedance that is shown in Figure 4.15. The frequency characteristics of this feedback as has been said earlier in chapter 2, should be a low-pass filter with the cut-off frequency at half the sampling rate ($f_s/2$). The feedback frequency response has been represented in Figure 4.17 and Figure 4.18 for the pre-layout and post-layout simulations respectively.

In Figure 4.23 there can be seen some mismatches between the MATLAB simulations and the circuit simulations. It stems from a number of reasons. The MATLAB simulation only sums the individual weighted currents and show the output result. As can be seen in Figure 4.22 the input current of the TIA is not a single tone sinusoidal. Hence, it cannot be represented with one frequency, and its spectrum is wide. Based on the frequency, the parasitic capacitances in the input of the TIA show different impedances for different signals. The TIA has been designed to accommodate as much as frequency spectrum as possible. However, its bandwidth is limited to the Nyquist criteria. So, the mismatch cannot be explained with a simple gain error but it is a time-dependent variable. It can also be dedicated that the error is not signal dependent and it does not follow the pattern of a harmonic.
In Figure 4.24, the output voltage of the TIA can be seen. It is sampled through the CSSAR ADC and the digital signal has been reconstructed using an ideal DAC. The DAC output is shown in yellow color. Each sample corresponds to the value before the clock pulse comes in (the spikes can be distinguished).

Plotting the input current of the TIA in the post-layout simulations is a bit tricky for the extraction should happen exactly where the parasitic impedances are bypassed. Instead the output voltage of the TIA has been plotted along with the sampled signal. Figure 4.25 shows the afore-mentioned
signals. The waveform replicates the same waveform shape as in pre-layout simulations. Also, the post-layout ADC shows the same performance as in pre-layout simulations.

Figure 4.25 TIA transient differential output voltage (green) and the reconstructed ADC sample (yellow) – Post-layout
4.5. Power breakdown
In the previous sections of this chapter, power consumption of separate blocks are presented. A power consumption breakdown for each channel is shown in the pie chart of the Figure 4.26. Each channel consumes 798\(\mu\text{A}\) current in total.

![Power consumption per channel](image)

Figure 4.26 Power breakdown chart for each channel
5. Chip-level design
In chapter 3, the design and implementation of the individual blocks was presented. In chapter 4, more details were presented along with the simulation results and the layout of each block. However, there are some considerations related to the chip design as a whole that could not be presented in the preceding chapters as they were not fit for the subject. In this chapter, the chip’s floor plan, current generation on the chip, digital sub-blocks, clock tree, and its auxiliary circuitry, power domains, and a few other considerations regarding the test set up of the chip will be presented.

5.1. Current generation
For test purposes and also to be enable to calibrate the current manually, the current is considered to be generated outside the chip. Figure 5.1 shows the 5µA current that is injected into the chip and copied through the same size transistors. Each transistor has a large length in order to increase the output resistance of the transistor and increase the accuracy of the copied current. From the current copy cell, 5µA unit currents are generated and distributed across the chip. Each LNA, trans-conductance amplifier, and TIA uses a unit 5µA current to create its bias voltages. The ADC’s reference buffer and its comparator each require 10µA unit currents.

![Figure 5.1 Current copy cell](image)

5.2. Binary-to-one-hot encoder
The trans-conductor amplifiers use 3 bits to generate 8 gain levels (see section 3.5.1). There is a binary-to-one-hot encoder required in each stage. The circuit configuration is shown in Figure 5.2. The Enable (EN) bit that is shared by all trans-conductance amplifiers determines whether the binary-to-one-hot encoder is activated or not.
5.3. ADC voltage reference

A resistive divider is used to generate the reference voltage for the SAR ADC on the chip. However, the swing of the output of the signal chain is not exactly the same as the range that the ADC was designed for; hence, the reference voltage is set from outside the chip using a resistive divider with a potentiometer for better calibration and a pin was allocated.

5.4. Clock

5.4.1. Planning and unit design

Generation, distribution and controlling the clock signals in the chip has certain requirements that should be addressed properly. Only one clock rate is fed to the chip, and the rest of the clock-related control signals should be generated on-chip. Based on the needs, a clock generator and controller is designed. This block also controls the synchronization of the outputs. Here comes a brief introduction on the requirements of the design of this block.

A 100MHz rate clock is fed into the chip which is intended to generate the sampling clock of the four 25MHz-rate ADCs. The four ADC outputs are all sampled at the same time, and then fed out using a time-interleaved manner (time division multiplexing). All of the blocks including ADC and the switches of trans-conductance amplifier use a 25MHz rate clock, which means there is a clock divider required on the chip. For test purposes, we want to be able to see the performance of a single ADC while switching off the other ADCs, so a logic circuit should be added that enables this. On the other hand, due to practical limitations in the number of connections, there are only 10 pins allocated for the digital output of the chip whereas a total of 40 bits are generated in each 25MHz clock cycle by the four ADC’s. First, each ADC output is latched on a 10-bit register and using a 2-bit counter, these signals are fed out; these counter counts from 00 to 11 and these are connected to the select bits of the output multiplexer. With a reference signal, one can determine which output is being read.

Figure 5.2 3-bit decoder
Figure 5.3 shows a symbol of the **Clock Generation and Control (CGC)** block with inputs and outputs. ADC_EN bit is used to program the ADC’s working scheme and along with CTRL<1:0> they create G1-G4 bits which are then fed to ADC_1 to ADC_4; they determine whether all of the four ADC’s are working at the same time or a single ADC is selected and the rest are turned off. Please refer to Table 5.1 for the control signal values.

**Table 5.1 CGC unit truth table**

<table>
<thead>
<tr>
<th>Condition</th>
<th>G1</th>
<th>G2</th>
<th>G3</th>
<th>G4</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADC_EN = '1', CTRL&lt;1:0&gt; = xx</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>ADC_EN = '0', CTRL&lt;1:0&gt; = 00</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>ADC_EN = '0', CTRL&lt;1:0&gt; = 01</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>ADC_EN = '0', CTRL&lt;1:0&gt; = 10</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>ADC_EN = '0', CTRL&lt;1:0&gt; = 11</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

CLK_25 signal is the general clock that is used across the chip. SYNC is the reference signal to determine the corresponding ADC and when the SYNC signal comes in, it shows that the first output of the multiplexer corresponds to the first ADC. Also, the Latch signal is used for the output latch.

Aside from the generation and control of the clock, care must be taken in the clock distribution. A symmetrical configuration and enough driving power are the two key elements in designing a clock tree. The following sub-sections give a thorough picture of clock-related designs and considerations.

### 5.4.2. Generation

Figure 5.4 shows the clock divider included in the CGC unit. This circuit generates four non-overlapping clocks with 25MHz rate and 25% duty cycle. The first output, labeled CLK25, is fed out of the CGC and is used as the general clock on the chip. The Reset bit comes from outside of the chip and sets the starting point.
Figure 5.4 Clock Divider

Figure 5.5 shows the timing diagram of the signals in Figure 5.4. When the signal Reset becomes ‘1’ clock generation starts. The SYNC signal is used to determine which ADC is being read and it is corresponding to the last clock phase CLK25_P4. The Latch signal has a 50% duty cycle and it is generated at the 3rd clock cycle of the 100MHz clock. The application of this signal is described in the following sub-section.

**5.4.3. Output latch and multiplexing**

Figure 5.6 shows the block diagram of the latch and output multiplexer. On the rising edge of the Latch signal, the ADC outputs are sampled on the registers. See Figure 5.5 for the Latch signal timing with respect to the rest of the control signals. Select bits, $S0$ and $S1$, are generated by the CGC unit.
5.4.4. Distribution

The clock is distributed in a symmetrical manner on the chip. A clock tree was simulated with parasitic capacitance values at the distribution nodes, and clock buffers are added to these nodes. A qualitative schematic of the clock tree is shown in Figure 5.7 which also shows the buffer positions. These signals then enable and disable chopper switches in the output of the trans-conductance amplifier.
5.4.5. Non-overlapping clock

In each trans-conductance amplifier (See section 3.4.2.), a non-overlapping clock is required to switch between the resistors in the source of the input transistors. Figure 5.8 shows the configuration of the circuit. The input of this circuit is a single bit CTRL that is shared among the 32 AFEs.

![Non-overlapping clock generator](image)

5.5. Shift Register programming

To program the trans-conductance amplifiers without requiring an excessive number of input pins, a shift register is used. Figure 5.9 shows the connection of 32 series blocks. Each block contains a 10-bit shift register, and they are all connected in a daisy-chain manner. The clock is being distributed from the last block and the data is coming in from the first block; in this way, one can make sure that the data is ready when the clock arrives at the register. Each 10-bit shift register has a feedback described in chapter 3, which generates a pseudo-random sequence (see section 3.5.1. and fig. 3.21). Also, programming the shift registers require the feedback path to be opened and the data to be connected to the input; to do so, a 1-bit multiplexer was put in the input of the shift register. For the configuration of the feedback path and also the pseudo-random generated bits please look at section 3.5.1.
5.6. Power domains

There are several power-supply domains defined in this chip for practical reasons. First, the chip is 5mm long and 1mm wide which requires very wide power lines in order to reduce the series resistance of the supply lines. There are two columns of power lines distributed on the left side and the right side of the chip (the analog front end sits in between the columns). Second, as has been described earlier, the LNAs suffer from a rather poor PSRR; hence, a separate power supply was considered that is isolated from the rest of the circuit. Third, all of the digital circuitry share the same supply lines which make the other supplies cleaner. Fourth, the ADCs use their own power supply, and last but not least, the rest of the analog circuit share the common analog supply. Table 5.2 summarizes the power nets across the chip.

Table 5.2 Chip power domains

<table>
<thead>
<tr>
<th>Power Net</th>
<th>Ground Net</th>
</tr>
</thead>
<tbody>
<tr>
<td>Analog Circuit</td>
<td>AVDD AGND</td>
</tr>
<tr>
<td>LNA</td>
<td>VDDLNA GNDLNA</td>
</tr>
<tr>
<td>Digital blocks</td>
<td>VDD! VSS!</td>
</tr>
<tr>
<td>ADC</td>
<td>VDDADC GNDADC</td>
</tr>
</tbody>
</table>

All of these domains have a 1.8V headroom.

5.7. Floor plan and Full chip

The floor plan of the chip is designed to match the linear array of 32 PZT elements considered for test. The elements have a 150µm pitch, making the chip at least 4800 µm long. In order to accommodate the seal ring around the chip, an additional 200 µm was added to the length. The chip is 1mm wide; hence the chip has a rectangular shape with the dimensions 5mm*1mm.

Figure 5.10 shows the full chip lay out. In chapter 4, the layout of each stage was presented individually. Now, the whole chip is shown. Due to the fact that the chip is very long, single elements cannot be distinguished on the figure. On the two sides, pads are placed. A long column of pads
between the LNA-Trans-conductance chain and TIA-ADCs has been put. The column consists of the input nodes of the TIAs, digital supply and also analog supply. The empty spaces between the TIA-ADC blocks are filled with metal to meet density rules. In the center of the chip, the output multiplexer is placed.

![Full Chip layout](image)

---

The chip has been fabricated in TSMC 0.18µm MSRF technology. A micrograph of the chip is shown in Figure 5.11. The black pieces are density fillers. The rest can be compared to the full chip lay out.
Figure 5.12 shows a zoomed in version of the Figure 5.11. There are more details visible from this figure. The group is almost symmetrical except for the middle element that has less dummy filled area. The reason is that a dummy LNA that creates bias voltage for the input of the transconductance amplifiers was put there. The MOM caps of the ADC are also visible from this figure.
5.8. Interconnections
After the chip is fabricated, it should be integrated with the linear array of PZT elements. As already explained in section 5.7, the chip floor plan is designed to match the linear array and the elements should be connected to the inputs of the chip. This can be done in 2 ways: one can either wire bond each element directly to the corresponding pad on the ASIC, or make the connection via a trace on the PCB. Figure 5.13 illustrates the configuration of the interconnections and the test board. For this project, a wire bonding interconnection that connects the pad directly to the transducers and not through the PCB is planned to take place. You can also see the transmit elements (denoted by TX), on the two sides of the receive elements, that are both bonded to the PCB.

![Interconnections](image)

Figure 5.13 Interconnections

5.9. Test chip malfunction and solution
This prototype has an error that needs to be fixed before any experimental planning can take place. The four ADCs that are included on the chip use a current sink for the reference and that is fed to the ADC comparator; however, these currents references are fed mistakenly as a current source which makes the comparators to be turned off and hence, the ADCs are not working. On the other hand, due to the tight project schedule there was not enough time to add an analog output to the output pins, so there is no way to measure the chip performance. After this problem is fixed and the chip is re-taped out, one can aim for experimental planning to measure the chip characteristics.

5.10. Experimental planning
In the test setup that is considered for the chip, two transmit elements are used to generate broad transmit beams. They are activated with high-voltage pulses, and the echo signals coming from the surface of the objects are received on the 32 elements. Figure 5.14 shows a picture of the planned experiment. A needle in a tank of water can be a test phantom for a first demonstration of the chip performance.
5.11. Conclusion

In this chapter, the floor plan of the chip has been justified. Auxiliary circuits that are related to the function of the whole chip have been introduced. Also, the connections of the chip for test purposes have been described. Finally, in the last part of the chapter, a test setup for measurement of the chip that has been devised has also been presented.

In chapter 6, there will be a conclusion about this design project, and future steps that can be taken in order to improve the work.
6. Conclusion
6.1. Thesis contribution

The main goal of this thesis was to exploit the redundancy that exists in the spatial domain of ultrasound RF signals. A newly-developed compressive scheme that was used in optics imaging was adopted; it has been shown in optics that this compressive scheme is effective if used properly. As a result a system was designed and simulated for the mechanical environment and waves (ultrasound) by X. Li [15] and P. van der Meulen [17]. These investigations were the starting point of this work.

During the course of this project, an ultrasound receiver ASIC has been developed to implement the afore-mentioned scheme.

This ASIC has to accomplish the goals that are defined in two separate domains.

- The system is accurate in implementing mathematical algorithm, i.e., compressive sensing.
- The system satisfies typical ultrasound system constraints.

In order to design this chip, a number of techniques were used:

- An inverter based amplifier was used both in the first and last stages of the signal receive chain which increases the power efficiency.
- A pseudo-random sequence generator was designed, simulated, and implemented on the chip to satisfy the mathematical requirements; this design was actually used to generate a random sequence in the compressive sensing system simulations in MATLAB to see if it is random enough and was verified for its functionality.
- The circuit outputs are sampled and fed out in a time interleaved manner. With allocated control bits there exists a considerable amount of freedom in terms of observing the performance of the desired ADC; other ADCs can be switched off or the performance of a single ADC can be observed in the presence of the cross talks.

Finally, the most important contribution of the thesis is it promises the reconstruction of the images with an 8x reduction in cable count. However, 2 pulse-echo measurements are required for the image reconstruction which makes it an effective 4x reduction.
6.2. Future work

First and foremost step that comes after this work is to measure the fabricated silicon. There are always unforeseen behaviors that has to be quantified and compensated for.

However, before the measurement experiments can be carried out there is one issue that has to be fixed before the experimental verification can be done. The current reference of the four SAR ADCs are misconnected in a way that the all of the ADCs are off and unfortunately there is no analog output for test purposes.

Aside from the measurements, there are a number of steps that can be taken in the future.

This imaging scheme is not restricted to a certain type of ultrasound probe and can be inspired in all sorts of ultrasound tools; a TEE probe that goes through the esophagus, an IVUS probe that is visualizing inside an artery, or an ICE probe that is intended for intra-cardiac visualizations can all benefit from this scheme.

This test chip employs a linear array of 32 transducers for imaging. This will only result in a 2D image that is used for cross-sectional visualization. However, this can be mapped into a matrix of transducer elements and be used for 3D imaging.

This prototype is intended to perform cable count reduction while the chip itself has a substantial number of pins on the each side. On one side, 32 pins were inserted to be wire bonded to the transducer elements. If the array of PZT elements can be integrated on the ASIC, that would decrease the number of connections. Also, a number of control signals along with the reference current can be generated on chip whereas they are assigned from outside of the chip.

The analog front end (AFE) was intended to have a power consumption of 1mW on average per channel. The AFE was combined with an ADC to have a digitized output. The AFE separately satisfies the power cap while combining it with ADC will make it exceed the limit. In order to make the design more power efficient two methods are suggested. First, the AFE has a relatively efficient design and supposedly there is not much room for power improvement; however one might combine 2 out of 3 stages of the design and reach for a lower current consumption. Second, the ADC comparator was designed for a faster speed. There is a lot of room for optimizing the power consumption of the dynamic comparator. A combination of these methods can also take place if needed.

Finally, In order to make this work complete, this ASIC can be turned into a complete transceiver. By adding transmit elements on top of the ASIC and also including high voltage pulser circuits, the chip can be integrated at the tip of the probe and be used for the imaging without auxiliary elements.
### List of Abbreviations

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>AFE</td>
<td>Analog Front End</td>
</tr>
<tr>
<td>ASIC</td>
<td>Application Specific Integrated Circuit</td>
</tr>
<tr>
<td>CASFVF</td>
<td>Cascoded Flipped Voltage Follower</td>
</tr>
<tr>
<td>CGC</td>
<td>Clock Generation and Control</td>
</tr>
<tr>
<td>CMFB</td>
<td>Common-mode Feedback</td>
</tr>
<tr>
<td>CNR</td>
<td>Contrast-to-Noise Ratio</td>
</tr>
<tr>
<td>CS</td>
<td>Compressive Sensing</td>
</tr>
<tr>
<td>CSSAR</td>
<td>Charge-Sharing SAR</td>
</tr>
<tr>
<td>DCSL</td>
<td>DC Servo Loop</td>
</tr>
<tr>
<td>ECG</td>
<td>Electrocardiography</td>
</tr>
<tr>
<td>GBW</td>
<td>Gain-Bandwidth Product</td>
</tr>
<tr>
<td>ICE</td>
<td>Intra-Cardiac Echocardiography</td>
</tr>
<tr>
<td>IVUS</td>
<td>Intravascular Ultrasound</td>
</tr>
<tr>
<td>LFSR</td>
<td>Linear-Feedback Shift Register</td>
</tr>
<tr>
<td>LNA</td>
<td>Low Noise Amplifier</td>
</tr>
<tr>
<td>NTF</td>
<td>Noise Transfer Function</td>
</tr>
<tr>
<td>NTF</td>
<td>Noise Transfer Function</td>
</tr>
<tr>
<td>OTA</td>
<td>Operational Trans-conductance Amplifier</td>
</tr>
<tr>
<td>PSD</td>
<td>Power Spectral Density</td>
</tr>
<tr>
<td>PSRR</td>
<td>Power Supply Rejection Ratio</td>
</tr>
<tr>
<td>PZT</td>
<td>Lead Zirconium Titanate</td>
</tr>
<tr>
<td>SAR</td>
<td>Successive Approximation Register</td>
</tr>
<tr>
<td>SINAD</td>
<td>Signal-to-Noise-And-Distortion ratio</td>
</tr>
<tr>
<td>TEE</td>
<td>Trans esophageal Echocardiogram</td>
</tr>
<tr>
<td>TGC</td>
<td>Time-Gain Compensation</td>
</tr>
<tr>
<td>TIA</td>
<td>Trans-Impedance Amplifier</td>
</tr>
<tr>
<td>TTE</td>
<td>Transthoracic Echocardiogram</td>
</tr>
</tbody>
</table>


[38] J. Ramirez-Angulo, S. Gupta, I. Padilla, R.G. Carvajal, A. Torralba, M. Jimenez, F. Munoz, "Comparison of conventional and new flipped voltage structures with increased input/output


