High Efficiency IQ-sharing Transmitter

Lei Chen
High Efficiency IQ-sharing Transmitter

Thesis

submitted in partial fulfillment of the requirements for the degree of

Master of Science

in

Electrical Engineering

by

Lei Chen
born in Nanjing, P.R. China

This work was performed in:

IMEC, Holst Centre
High Tech Campus 31
5656 AE Eindhoven
Netherlands
The work in this thesis was supported by IMEC. Their cooperation is hereby gratefully acknowledged.
The undersigned hereby certify that they have read and recommend to the Faculty of Electrical Engineering, Mathematics and Computer Science for acceptance a thesis entitled “High Efficiency IQ-sharing Transmitter” by Lei Chen in partial fulfillment of the requirements for the degree of Master of Science.

Dated: October 25, 2016

Supervisor: Prof. dr. Leo de Vreede

Industry Supervisor: Ir. Ao Ba

Committee Members:

Dr. ir. Morteza Alavi

Dr. ir. Massoud Babaie

Dr. ir. Michiel Pertijos
Table of Contents

Acknowledgments ix

1 Introduction 1
  1.1 IoT and 802.11 ............................................. 1
  1.2 Transmitter specification .............................. 2
    1.2.1 Modulation ........................................... 2
    1.2.2 Frequency band ...................................... 3
    1.2.3 Power consumption and energy efficiency .......... 4
    1.2.4 Output power ........................................ 5
    1.2.5 Conclusion ........................................... 5
  1.3 Outline of the thesis ................................... 5

2 Digital transmitter design 7
  2.1 State-of-the-art transmitter architectures ........... 7
    2.1.1 Digital polar architecture ......................... 7
    2.1.2 Digital outphasing architecture .................... 9
    2.1.3 Digital quadrature architecture .................... 11
  2.2 Proposed architecture .................................. 12
    2.2.1 General consideration .............................. 12
    2.2.2 Digital processor ................................... 18
    2.2.3 Sign bit selector .................................... 19
    2.2.4 Retiming ............................................. 19
    2.2.5 PA stage .............................................. 21
      2.2.5.1 Resolution ...................................... 21
      2.2.5.2 Encoding consideration ......................... 22
      2.2.5.3 IQ-sharing ...................................... 22
      2.2.5.4 Constellation consideration ................... 24
  2.3 Conclusion ............................................... 26

Master of Science Thesis  Lei Chen
### 3 Digital power amplifier design

3.1 General considerations .......................................................... 27
   3.1.1 Output power, efficiency and linearity .................................. 27
   3.1.2 Current-mode PAs and switch-mode PAs .............................. 28
   3.1.3 Single-ended and differential PAs ...................................... 32

3.2 State-of-the-art DPA architectures ........................................ 34
   3.2.1 Current source based DPA .............................................. 34
   3.2.2 Switch-based DPA ...................................................... 35

3.3 DPA design and optimization .............................................. 35
   3.3.1 The principle of SCPA ................................................. 35
   3.3.2 SCPA design and optimization ...................................... 38

3.4 Sign bit selector ................................................................. 45
   3.4.1 Divider design .......................................................... 45
   3.4.2 MUX design ............................................................. 46

3.5 Conclusion ............................................................................ 47

### 4 Digital processor design

4.1 Data demux and upsampling .................................................. 49
4.2 FIR filter .............................................................................. 50
4.3 Code converter ....................................................................... 51
4.4 Saturater .............................................................................. 52
4.5 Conclusion ............................................................................ 52

### 5 Simulation results

5.1 Output power and efficiency .................................................. 53
5.2 Linearity .............................................................................. 54
5.3 Power breakdown ................................................................... 57
5.4 Comparison with state-of-art works ...................................... 57
5.5 Layout overview .................................................................... 57

### 6 Conclusion

6.1 My contributions ................................................................. 59
6.2 Future work .......................................................................... 59
List of Figures

1.1 Internet-of-Things schematic showing different applications. ...................... 2
1.2 The multipath propagation effect (Courtesy of Razavi’s RF Microelectronics [10]). 3
1.3 The illustration of OFDM (Courtesy of Razavi’s RF Microelectronics [10]). .... 4

2.1 Envelope elimination and restoration. .................................................. 7
2.2 The improved version of analog polar modulation. .................................. 8
2.3 The architecture of digital polar transmitter. ......................................... 9
2.4 The system of outphasing. ................................................................. 10
2.5 The architecture of an analog quadrature transmitter. .............................. 11
2.6 The architecture of a digital quadrature transmitter. ............................... 11
2.7 The architecture of proposed IQ-sharing digital quadrature transmitter. ....... 12
2.8 Illustration of EVM ........................................................................ 13
2.9 Clock signals in 4 phases with different duty cycles .............................. 13
2.10 The fundamental components of the 25% duty cycle clocks show as unit vectors. 14
2.11 The fundamental components of the 50% duty cycle clocks. .................. 15
2.12 The fundamental component of a square wave vs the duty cycle. ............ 17
2.13 The proposed transmitter architecture overview. ................................. 17
2.14 The digital processor overview. ......................................................... 18
2.15 The sign bit selector module overview. .............................................. 19
2.16 Glitches are generated if amplitude signal and clock signal are misaligned. .... 20
2.17 The retiming mechanism. ............................................................... 20
2.18 The block diagram of retiming. ......................................................... 21
2.19 Different combination of $x$ bits encoded in binary and $(6-x)$ bits encoded in unary. 23
2.20 Separate PA banks in the quadrature transmitter. ................................. 23
2.21 PA unit cells in the IQ-sharing quadrature transmitter. ......................... 24
<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>2.22</td>
<td>The constellation points of proposed IQ sharing architecture.</td>
</tr>
<tr>
<td>2.23</td>
<td>The final arrangement of PA cells.</td>
</tr>
<tr>
<td>2.24</td>
<td>A general case of the PA: (I, Q) = (13, 27).</td>
</tr>
<tr>
<td>3.1</td>
<td>The general structure of a current source PA.</td>
</tr>
<tr>
<td>3.2</td>
<td>The drain voltage and drain current of current-mode PAs at full drive.</td>
</tr>
<tr>
<td>3.3</td>
<td>The schematic of a class-F PA</td>
</tr>
<tr>
<td>3.4</td>
<td>The schematic of a class-E PA</td>
</tr>
<tr>
<td>3.5</td>
<td>Three conditions are required for class-E PAs</td>
</tr>
<tr>
<td>3.6</td>
<td>The schematic of a class-D PA</td>
</tr>
<tr>
<td>3.7</td>
<td>The drain voltage and current of a class-D PA</td>
</tr>
<tr>
<td>3.8</td>
<td>The effect of current traveling back due to bonding wire.</td>
</tr>
<tr>
<td>3.9</td>
<td>The problems are alleviated in a differential PA.</td>
</tr>
<tr>
<td>3.10</td>
<td>An example of the current mode DPA</td>
</tr>
<tr>
<td>3.11</td>
<td>An example of the switch-based DPA</td>
</tr>
<tr>
<td>3.12</td>
<td>The block diagram of a single-ended SCPA</td>
</tr>
<tr>
<td>3.13</td>
<td>The simplified model of a single-ended SCPA</td>
</tr>
<tr>
<td>3.14</td>
<td>The Thevenin equivalent circuit of a single-ended SCPA</td>
</tr>
<tr>
<td>3.15</td>
<td>The input capacitance of the capacitive voltage divider.</td>
</tr>
<tr>
<td>3.16</td>
<td>A model of SCPA considering other losses.</td>
</tr>
<tr>
<td>3.17</td>
<td>A model of SCPA showing all the losses.</td>
</tr>
<tr>
<td>3.18</td>
<td>The peak output power of SCPA with different $R_{opt}$ and $C_1$.</td>
</tr>
<tr>
<td>3.19</td>
<td>The peak efficiency of SCPA with different $R_{opt}$ and $C_1$.</td>
</tr>
<tr>
<td>3.20</td>
<td>The probability distribution pattern of QPSK with OFDM.</td>
</tr>
<tr>
<td>3.21</td>
<td>The average efficiency of SCPA with different $R_{opt}$ and $C_1$.</td>
</tr>
<tr>
<td>3.22</td>
<td>The peak output power of SCPA with different $R_{on}$.</td>
</tr>
<tr>
<td>3.23</td>
<td>The average efficiency of SCPA with different $R_{on}$.</td>
</tr>
<tr>
<td>3.24</td>
<td>$L_1$, $L_2$ and $C_2$ values.</td>
</tr>
<tr>
<td>3.25</td>
<td>The SCPA with ESD model.</td>
</tr>
<tr>
<td>3.26</td>
<td>Voltage swing at $V_{pad}$.</td>
</tr>
<tr>
<td>3.27</td>
<td>The sign bit selector module overview.</td>
</tr>
<tr>
<td>3.28</td>
<td>The proposed divider schematic.</td>
</tr>
<tr>
<td>3.29</td>
<td>The waveform of proposed frequency divider.</td>
</tr>
<tr>
<td>3.30</td>
<td>Other frequency divider designs.</td>
</tr>
<tr>
<td>3.31</td>
<td>The MUX schematic.</td>
</tr>
<tr>
<td>4.1</td>
<td>The data demux and upsampling.</td>
</tr>
<tr>
<td>4.2</td>
<td>The operation of data demux and upsampling.</td>
</tr>
<tr>
<td>4.3</td>
<td>Impulse response of the FIR filter.</td>
</tr>
<tr>
<td>Figure</td>
<td>Description</td>
</tr>
<tr>
<td>--------</td>
<td>------------------------------------------------------------------------------</td>
</tr>
<tr>
<td>5.1</td>
<td>Output power versus the efficiency of the power amplifier at 900 MHz.</td>
</tr>
<tr>
<td>5.2</td>
<td>Output voltage of the transmitter versus the digital code.</td>
</tr>
<tr>
<td>5.3</td>
<td>INL of the power amplifier.</td>
</tr>
<tr>
<td>5.4</td>
<td>Output spectrum of the transmitter with 8 MHz 64-QAM data packets</td>
</tr>
<tr>
<td>5.5</td>
<td>Close-in view of the output spectrum without the saturater.</td>
</tr>
<tr>
<td>5.6</td>
<td>The 64-QAM constellation.</td>
</tr>
<tr>
<td>5.7</td>
<td>The power breakdown.</td>
</tr>
<tr>
<td>5.8</td>
<td>Layout overview.</td>
</tr>
</tbody>
</table>
List of Tables

1.1 ISM frequency bands available for 802.11 ah. ........................................ 3
1.2 Design specification of the transmitter ..................................................... 5
2.1 The EVM of 2 kinds of duty cycle clocks with 1 ps mismatch ..................... 16
3.1 The comparison of current-mode PAs ....................................................... 29
4.1 Conversion from 2’s complement to the required format ............................. 51
5.1 Performance summary and comparison with other digital transmitters .......... 58
Acknowledgments

This work would not be possible without the help of many people, it is my great pleasure to express my sincere appreciation to all the people who supported my in this project and my study process.

First and foremost, I would like to express my deepest gratitude to Professor Leo de Vreede. Two years ago, I moved to Delft, Netherlands. He introduced me into this interesting RF world by his lively and humorous teaching in all the RF related course. His patient explanation and responsible guidance help me gain a solid understanding of RF knowledge. Without his affording the opportunity and extensively guidance during the project, the work cannot be finished.

Second, special thanks to my daily supervisor in IMEC-NL, Ir. Ao ba. He gave me valuable advice and hands-on instructions. He is a true gentleman, who is willing to share his knowledge and experience whenever I need his help. Moreover, whenever I need some other help from other group, he is always enthusiastic about helping me get the resources. I learned a lot from his expertise.

Third, I am particularly grateful to Dr. ir. Morteza Alavi. He brought valuable suggestions to me in our regular meetings. Furthermore, he carefully and patiently reviewed every detail of this thesis. I really appreciate his support in this process.

Also, I want to thank all my colleagues in IMEC-NL, they supported me a lot when I faced any practical issue during the project.

Last but not least, I would like to thank my family and friends, who supported and encouraged me with their understanding and company.

Lei Chen
Delft, University of Technology
October 25, 2016
Chapter 1

Introduction

1.1 IoT and 802.11

Internet-of-Things (IoT) [1], which represents the new revolution of computing and communication, has drawn substantial attraction in recent years. There is no unique definition available for IoT that is accepted by different groups including researchers, developers, and practitioners. However, over the last decade, the concept of IoT is maturing and now becomes a foundation that connects sensors, actuators, and all the wireless smart "things" and embeds them seamlessly into the environment around us, thus, enabling not only human-to-human but also human-to-machine and even direct machine-to-machine communication. Moreover, the development of cloud computing, usually integrated with IoT, brings virtually unlimited storage, computing, and analyzing power to IoT. Being coded into a network, everyday objects can sense and analyze the environment, understand and respond to “anything” at “anytime” from “anywhere” [2]. A schematic showing the application areas of IoT is shown in figure 1.1 [2].

Because the foundation of IoT is connection, wireless communication plays an important role. Nowadays, more and more devices rely on the wireless communication because of its easy access, portability, and reliability. IEEE 802.11 Wireless Local Area Network (WLAN), which is now one of the most popular wireless standards, operates in 2.4 GHz and 5 GHz in the ISM band (Industrial, Scientific, and Medical radio band). The increasing demand of wireless access and network deployment, however, also bring saturation in the 2.4 GHz and 5 GHz bands. The mutual interference such as data package collisions, therefore, could cause significant performance decrease. On the other hand, 2.4 GHz and 5 GHz bands also limit the wavelength of the EM wave so the networks deploying IEEE 802.11 and its amendments a/b/g/n are mostly restricted in indoor or close range applications. Another problem is the poor spectral efficiency coming from the simple modulation schemes (GFSK for BLE and OQPSK for ZigBee) in these standards. This limits the capacity of the network as the ISM band is getting crowded. Moreover, the link robustness against fading and interference needs to be considered, especially in conditions like urban environment.
Because of these drawbacks, new WLAN technology named IEEE 802.11ah is developed. The standardization work was launched by TGah since November 2010 [3]. In January 2016, Wi-Fi Alliance announced “Wi-Fi HaLow” [4] designation for devices using 802.11ah technology. It utilizes sub-GHz frequency band and can provide extended range for large scale application such as wide area based sensor networks, sensor backhaul systems and potential Wi-Fi off-loading functions [5]. At the same time, the use of OFDM modulation improves spectral efficiency as well as the link robustness against fading and interference. The supported bandwidth includes from 1 MHz/2 MHz mandatory modes to 16 MHz optional modes.

1.2 Transmitter specification

In OFDM transmitters, traditional Cartesian or quadrature architecture are widely used for its wide modulation bandwidth [6] [7] [8]. At the same time, a wireless transceiver usually consumes most of the power budget of an IoT network, therefore, energy efficiency is also a key factor that should be considered during transmitter design. The design specification will be analyzed in the below.

1.2.1 Modulation

In the wireless communication, a transmitter is designed to work coordinately with the receiver, which means they should use the same modulation scheme. In practice, however, a
problem called “multipath propagation” [9] needs to be considered. This issue happens when the signal is transmitted from the transmitter (TX) to the receiver (RX), the electromagnetic waves experience different paths. For example, one wave travels directly from the TX to the RX, another wave experienced an reflection by the wall before reaching the destination, which is shown in figure 1.2 [10].

These delay spreads at the RX might lead to “intersymbol interference” (ISI). Moreover, this effect is worse for higher bit rate. To alleviate it, a technique called “orthogonal frequency division multiplexing” (OFDM) is employed. For a single-carrier signal with a data rate of $r_b$, which occupies a certain bandwidth. In OFDM, the data will be first demultiplexed by a factor of $N$ before transmission, which generates $N$ branches, each one has a data rate of $r_b/N$. Because each branch has a much lower bit rate, the ISI can be alleviated. In the end, a multi-carrier system is realized, each branch uses a subcarrier and occupies a small bandwidth. But the total bandwidth is the same compared to the original single-carrier spectrum. This process is shown in figure 1.3 [10].

### 1.2.2 Frequency band

The goal of the IEEE 802.11ah standard is to provide a global wireless connection that operates within the unlicensed ISM (Industrial, Scientific, and Medical) band below 1 GHz. There are several ISM bands available for 802.11 ah for different areas around the world and they are summarized in table 1.1 [3].

**Table 1.1:** ISM frequency bands available for 802.11 ah.

<table>
<thead>
<tr>
<th>Region</th>
<th>Frequency band(MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>China</td>
<td>755-787</td>
</tr>
<tr>
<td>Europe</td>
<td>863-868</td>
</tr>
<tr>
<td>Japan</td>
<td>916.5-927.5</td>
</tr>
<tr>
<td>Korea</td>
<td>917.5-923.5</td>
</tr>
<tr>
<td>Singapore</td>
<td>866 - 869 &amp; 920-925</td>
</tr>
<tr>
<td>USA</td>
<td>902-928</td>
</tr>
</tbody>
</table>
4 Introduction

(a) Single-carrier transmission with high data rate

Because these frequency bands are all located around 900 MHz, thus, this frequency is considered as the carrier frequency in this project. To be compatible with standards of certain regions, only minor modification can be applied and thus, this work would still be valid.

At the same time, multiple modes of operation are supported in IEEE 802.11 ah associated with different frequency bandwidth, including 1, 2, 4, 8 or 16 MHz [3]. The frequency bandwidth supported in this project is up to 8 MHz.

1.2.3 Power consumption and energy efficiency

IEEE 802.11 ah aims for a standard with low power consumption, so it is a key consideration when design a transmitter for applications based on this protocol. However, with advanced CMOS technology, energy-efficient wireless solutions rely on high level of integration, which favors digitally intensive approaches rather than traditional analog RF solutions [11]. Because digitally intensive circuit can benefit from lower dynamic power consumption as well as higher switching speed [12] along with minimum feature size and supply voltage scaling down in CMOS technology. Therefore, this project focuses on the design of a fully digital transmitter. Recent state-of-the-art OFDM digital transmitters have power consumption at the level of hundreds milliwatts [13] [14] [15], this work targets at the power consumption lower than 10 mW.

Due to the large reduction of targeted power consumption, much higher system efficiency should be expected, which could not only lead to longer battery life but also better mobility of the device. The expected system average efficiency is at least 10%.

At the same time, power amplifier(PA) is the most power-consuming part of a transmitter.

Lei Chen

Master of Science Thesis
Therefore, this project aims to reach a peak PA efficiency of 40%, which is a reasonable number compared to recent state-of-the-art works [8] [16] [17].

1.2.4 Output power

The output power level of a transmitter should be defined based on link budget [18]. Generally speaking, in a transceiver system, the output power of a transmitter is attenuated by propagation through the certain medium. Thus, the power introduced at the receiver must be larger than the receiver sensitivity. In a wireless system, the medium is free space. At the same time, the attenuation is usually modeled as path loss (PL). Although several complex models for path loss exist [19], a simple equation [20] is used in this thesis to illustrate the main idea:

$$PL_{dB} = 10 \gamma f_s \log_{10} \left( \frac{4 \pi f_o c}{d} \right)$$

Where $PL$ is the path loss, $\gamma f_s$ is the attenuation coefficient, $f_o$ is the carrier frequency, $c$ is the speed of light and $d$ is distance from a transmitter to a receiver. Note that a modest path loss is achieved utilizing a sub-GHz frequency bandwidth, in turn, relaxes the transmitter peak output power. In this project, the average output power is set to be 2 dBm. Because of the adoption of OFDM modulation, peak-to-average power ratio (PAPR) needs to be considered, which means that the transmitter should operate efficiently at both peak and average power level. Therefore, the intended peak output power is 10 dBm.

1.2.5 Conclusion

In conclusion, specifications of the proposed digital transmitter are summarized in table 1.2.

<table>
<thead>
<tr>
<th>Table 1.2: Design specification of the transmitter</th>
</tr>
</thead>
<tbody>
<tr>
<td>Modulation: OFDM</td>
</tr>
<tr>
<td>Frequency band: IEEE 802.11 ah 900 MHz, up to 8 MHz bandwidth</td>
</tr>
<tr>
<td>Average output power: 2 dBm</td>
</tr>
<tr>
<td>Peak output power: 10 dBm</td>
</tr>
<tr>
<td>System efficiency: &gt;10%</td>
</tr>
<tr>
<td>PA efficiency: &gt;40%(at peak)/ as high as possible(at average)</td>
</tr>
</tbody>
</table>

1.3 Outline of the thesis

This thesis consists of 6 chapters. The first chapter introduces the project which illustrates the background information, motivation and challenges. Moreover, design specifications are discussed.

Chapter 2 discloses the design of the digital transmitter. This chapter, first, compares several state-of-the-art digital transmitter architectures and discusses their advantages and disadvantages. Moreover, proposed transmitter architecture is introduced. The design principles
of all main modules such as digital processor, power amplifier, etc. are also covered. In the meantime, the IQ-sharing idea is discussed as well.

Chapter 3 focuses on the design of digital power amplifier (DPA). State-of-the-art DPA architectures are compared at first. Then, the DPA used in this project is analyzed. The design procedure and optimization is the primary objective of this chapter.

Chapter 4 shows the operation of a digital processor used in this project. Several key modules are discussed.

Chapter 5 gives the simulation results of proposed transmitter. Comparison with other state-of-the-art transmitters is also given.

Chapter 6 draws the conclusion of the thesis. My contribution is summarized and practical suggestions are proposed for future work.
Chapter 2

Digital transmitter design

As discussed before, this project focuses on the digital transmitter design. In this chapter, several state-of-the-art transmitter architectures will first be introduced and their strengths and weaknesses will be compared. Then, proposed architecture will be discussed in detail. Moreover, design consideration and all the important modules will be included.

2.1 State-of-the-art transmitter architectures

2.1.1 Digital polar architecture

The polar modulation comes from a linearization technique called “envelope elimination and restoration” (EER) [21], the basic idea can be shown in figure 2.1. It is a classical method used to improve linearity and efficiency of radio transmitter. The RF signal $V_{in}$ is first decomposed into the envelope and phase components. Then, they are processed in their own path and eventually combined in the power amplifier. Since the phase component is a constant envelope signal, the PA can be nonlinear and thus efficient.

![Figure 2.1: Envelope elimination and restoration.](image-url)

Master of Science Thesis
Lei Chen
However, this structure performs the decomposition in the RF domain, which leads to at least 2 problems. First and foremost, the limited bandwidth of the limiter, which performs the phase signal extraction, would cause unavoidable nonlinearity and AM-PM distortion [10], which leads to inferior EVM and spectral purity. Second, RF domain operation would be more power hungry. Therefore an improved polar modulation is depicted in figure 2.2.

\[ A(t) = \sqrt{I^2(t) + Q^2(t)} \]  
\[ \phi(t) = \tan^{-1}\left(\frac{Q(t)}{I(t)}\right) \]

Then, the envelope data is converted to analog signal to control a low-dropout (LDO) regulator so that the supply of the PA can track the envelope. The phase data is used to control the PLL to generate required phase signal. However, the LDO would limit the bandwidth of this analog polar transmitter. Moreover, the AM-PM characteristic of the PA entails nonlinearity (e.g., caused by output capacitance nonlinearity [10]). Usually complex feedback techniques are necessary to correct these issues.

To further investigate and exploit the polar modulation, the digital polar architecture, which has drawn substantial interest in recent years, employs a digital power amplifier (DPA). This digital PA stage usually consists of several PA unit cells and they are directly modulated by the digital code, in this case, the envelope data. The output power are directly proportional to the number of enabled cells. This structure is shown in figure 2.3.

Lots of researches have focused on this digital polar architecture [22] [14] [23] [16]. Ye, Lu, et al. [22] used an inverse class-D cascode PA sliced into 8 bits for the RF-DAC and a 9 bits phase modulator. A high peak output power of 23.3 dBm with 19.3% peak system efficiency can be achieved. This work used a complex power-combiner to modulate the impedance seen at the drain of PA output to boost the output swing and efficiency. But this impedance modulation also brings AM-AM nonlinearity which needs extra calibration. Yoo, Sang-Min, et al. [23] used
2.1 State-of-the-art transmitter architectures

Figure 2.3: The architecture of digital polar transmitter.

a switched-capacitor digital power amplifier to implement their transmitter. Details of this PA will be discussed in the next chapter. To improve the energy efficiency for large PAPR (peak-to-average power ratio) signals, 2 power supply is used and thus a class-G PA is actually implemented. With careful encoding of the switching sequence, they managed to maintain the DC power close to the PA output power so the efficiency is improved. Because of the switched-capacitor power amplifier (SCPA), the linearity is also superior. Zheng, Shiyuan, and Howard C. Luong.[14] proposed a WCDMA/WLAN digital polar transmitter using a common-source DPA array and replica PA for linearization in the AM path. Ba, Ao, et al.[16] presented an IEEE 802.11 ah fully digital polar transmitter using switched-capacitor DPA and ADPLL. System efficiency of 14% is achieved and digital pre-distortion is not needed because of the good linearity from SCPA.

With above work discussed, polar architecture could have the advantage of high efficiency and linearity. However, envelope variations could modulate the output impedance and bring nonlinearity. Although switched-capacitor PA could help with the output impedance, polar architecture would still suffer from the delay mismatch between AM path and PM path. And the bandwidth expansion caused by the CORDIC would also limited the bandwidth of input signals.

2.1.2 Digital outphasing architecture

Envelope variations, which is a problem in the polar architecture, can be avoided by using the outphasing architecture [24]. Outphasing is also a classical technique to improve efficiency and linearity of the PA. The system can be shown in figure 2.4. It consists of a signal separator, which separates the variable envelope input signal into 2 constant envelope signals, 2 nonlinear PAs, which can be class D or E, and a RF power combiner. It achieves “linear amplification with nonlinear components” (LINC) [25].

The input band-pass signal can be expressed as:

\[ V_{in}(t) = V_{env}(t) \cos[\omega_0 t + \phi(t)] = V_0 \cos[\omega_0 t + \phi(t)] \cos[\arccos \frac{V_{env}(t)}{V_0}] \] (2.3)

With \( \theta(t) = \arccos \frac{V_{env}(t)}{V_0} \),

\[ V_{in}(t) = V_{env}(t) \cos[\omega_0 t + \phi(t)] = V_0 \cos[\omega_0 t + \phi(t)] \cos[\theta(t)] \] (2.4)
With trigonometric relationship, the input signal can be separated into two constant-envelope phase-modulated components:

\[ V_1(t) = \frac{V_0}{2} \cos[\omega_0 t + \phi(t) + \theta(t)] \]  
\[ V_2(t) = \frac{V_0}{2} \cos[\omega_0 t + \phi(t) - \theta(t)] \]  

This separation requires significant complexity, therefore a more practical way is also to employ a CORDIC. Nevertheless, both of the amplitude and phase information are included in these components. Since their envelope is constant, switching-mode PAs, which can have high efficiency but are non-linear, can be used to amplify them. And it can be shown that:

\[ V_o(t) = A_v(V_1(t) + V_2(t)) = A_v V_{\text{in}}(t) \]  

Several works have been presented based on the digital outphasing transmitter [26] [27] [28]. Madoglio, Paolo, et al.[26] used inverter-based class-D PA stage for good efficiency. In the two constant-envelope components paths, 8 bits delay-based open-loop phase modulators are used to meet the WIFI specifications and bring wideband operation. The signals are combined using transformer. This work achieved 20 dBm average output power with 22% PAE with 2.05 V supply and allowed channel bandwidth up to 40 MHz. Tai, Wei et al.[28] designed a multi-section transformer power combiner based on class-D PA. The PA consists of 4 unit PA, each unit PA is a self-contained outphasing stage that can be digitally controlled to tune the output power. This work achieved 31.5 dBm peak output power and 27% PAE with 2.4 V supply.

Although outphasing can use nonlinear PAs and does not require supply modulation, several issues are still needed to be dealt with. First, the summation of 2 paths brings power loss and interaction between 2 paths needs to be considered. Second, two paths would occupy a larger bandwidth than the original input signal so not only the bandwidth would be a critical issue, any mismatch between 2 paths would cause spectral regrowth [10].
2.1.3 Digital quadrature architecture

In polar or outphasing architecture, a CORDIC is required to convert the coordinates, which is power hungry. However, quadrature transmitter directly processes the in-phase and quadrature-phase (IQ) data thus avoids the CORDIC. So bandwidth expansion would not appear and quadrature transmitter tends to be a better choice for wideband application. The analog quadrature transmitter is shown in figure 2.5.

![Figure 2.5: The architecture of an analog quadrature transmitter.](image)

In the analog transmitter, the mixer always suffers from the trade-off between linearity, noise performance, conversion gain and power consumption. In modern transmitter, a frequency divider is usually used after the oscillator so that oscillator pulling can be greatly suppressed and also the divider usually provides quadrature phases. In the digital architecture, however, the mixing operation can be achieved by an RF-DAC in the digital domain. Moreover, the combination occurs in the time domain. The I/Q mismatch occurs as delay mismatch and duty cycle mismatch. The digital quadrature transmitter is shown in figure 2.6.

![Figure 2.6: The architecture of a digital quadrature transmitter.](image)

In recent years the digital quadrature transmitter has drawn a lot of attention [7] [8] [29] [17]. Alavi, et al.[7] used 25% duty cycle quadrature LOs to up-convert the I and Q signals and orthogonally combined them in the RF domain. It used a switched current source DPA and achieved 22.8 dBm peak output power with 34% system efficiency. Jin, Hadong et al.[8] also used 25% duty cycle LOs but a switched-capacitor DPA is used so the mixing can happen in the digital domain. Because of 25% duty cycle, the time-division multiplexing can be achieved, I and Q paths can share the same cell so the combining can happen in the time domain. Transformer based power combining network is avoided. Benefiting from the 28 nm technology, they achieved 13.9 dBm peak output power with peak PA efficiency 40.4%. Yuan, Wen et al.[29] also used a switched-capacitor DPA but with 50% duty cycle LOs. They used
a class-G implementation for the PA to obtain higher efficiency. But separate banks are used for I and Q. In the end, this work achieved a peak PA output power of 20 dBm with 21% peak efficiency. Deng, Zhiming et al.[17] presented a dual-band digital WIFI transmitter. A high efficiency was achieved even with 50% duty cycle because of the I/Q complementary encoding scheme for the PA cells. Class-D DPA with 3.3 V supply was used. In the end, the 2.4 GHz/5.5 GHz transmitter achieved peak output power of 27.0/25.5 dBm. The DPA efficiency was 14.5%/14.1% at 8 dB/7.5 dB power back-off.

Therefore, digital quadrature transmitter is not only suitable for wideband applications but also has the potential to be power efficient.

2.2 Proposed architecture

2.2.1 General consideration

With the discussion made above, we propose an IQ-sharing digital transmitter. The IQ-sharing concept will be discussed in detail later in this chapter. But the key point of IQ-sharing is that instead of using 2 separate RF-DACs, I and Q branches can share one RF-DAC so that the maximum output power it can deliver might be larger than the traditional design as well as the efficiency. The block diagram of proposed IQ-sharing digital quadrature architecture is shown in figure 2.7.

![Figure 2.7: The architecture of proposed IQ-sharing digital quadrature transmitter.](image)

Before going further discussing about how to design each specific module, it is necessary to first determine the LO duty cycle. On one hand, the influences of time domain mismatches on different duty cycles are different. That means the EVM needs to be evaluated based on the duty cycle. On the other hand, from previous description, it is clear that different duty cycles of clock signal would affect the PA operation and furthermore the efficiency. [29] and [8] both used the switched-capacitor DPA but [8] used 25% duty cycle LOs, so the time-division multiplexing achieved much higher efficiency than [29]. Difference between 25% duty cycle and 50% duty cycle of the LOs or clock signals will be analyzed.

The EVM will first be analyzed. The way to calculate the EVM is shown in figure 2.8 [10].

The EVM is defined as the ratio of the rms magnitude of the error vectors and the ideal signal vector.
The 25% duty cycle clock and 50% duty cycle clock with same frequency can be shown in figure 2.9.

\[
EVM = \frac{1}{V_{\text{ideal}}} \sqrt{\frac{1}{N} \sum_{i=1}^{N} e_i^2}
\]  

(2.8)

Since the output voltage of the transmitter would go through a matching network, which is also a band-pass filter with reasonable quality factor, only the fundamental component of the square wave would exit. Therefore this fundamental component can be used as the ideal vector \( V_{\text{ideal}} \). According to the Fourier Series, the fundamental components of the 25% duty cycle clocks with 4 phases in Figure 2.9(a) are shown below separately.

\[
\vec{U}_{1,0} = \frac{2}{\pi} \sin\left(\frac{\pi}{4}\right) \cos(\omega t)
\]  

(2.9)

\[
\vec{U}_{0,1} = \frac{2}{\pi} \sin\left(\frac{\pi}{4}\right) \cos(\omega t + \frac{\pi}{2}) = \frac{2}{\pi} \sin\left(\frac{\pi}{4}\right) \sin(\omega t)
\]  

(2.10)
\[ \tilde{U}_{-1,0} = \frac{2}{\pi} \sin\left(\frac{\pi}{4}\right) \cos(\omega t + \pi) = -\frac{2}{\pi} \sin\left(\frac{\pi}{4}\right) \cos(\omega t) \quad (2.11) \]

\[ \tilde{U}_{0,-1} = \frac{2}{\pi} \sin\left(\frac{\pi}{4}\right) \cos(\omega t + \frac{3\pi}{2}) = -\frac{2}{\pi} \sin\left(\frac{\pi}{4}\right) \sin(\omega t) \quad (2.12) \]

And they can be drawn as unit vectors in the Cartesian coordinates system, shown in figure 2.10.

Figure 2.10: The fundamental components of the 25\% duty cycle clocks show as unit vectors.

Assume all 4 signals have a duty cycle mismatch of \( \Delta t_1 \) and using the first phase fundamental as a reference, other phases have a delay mismatch of \( \Delta t_2 \). Then these 4 fundamental components can be rewritten as:

\[ \tilde{U}_{1,0}' = \frac{2}{\pi} \sin\left(\frac{\pi}{4} + \pi \frac{\Delta t_1}{T}\right) \cos(\omega t) \]

\[ \tilde{U}_{0,1}' = \frac{2}{\pi} \sin\left(\frac{\pi}{4} + \pi \frac{\Delta t_1}{T}\right) \sin(\omega t + \omega \Delta t_2) \quad (2.14) \]

\[ \tilde{U}_{-1,0}' = -\frac{2}{\pi} \sin\left(\frac{\pi}{4} + \pi \frac{\Delta t_1}{T}\right) \cos(\omega t + \omega \Delta t_2) \quad (2.15) \]

\[ \tilde{U}_{0,-1}' = -\frac{2}{\pi} \sin\left(\frac{\pi}{4} + \pi \frac{\Delta t_1}{T}\right) \sin(\omega t + \omega \Delta t_2) \quad (2.16) \]

Then the error vectors are respectively calculated as:

\[ \tilde{e}_{1,0} = \frac{2}{\pi} \left[ \sin\left(\frac{\pi}{4} + \pi \frac{\Delta t_1}{T}\right) - \sin\left(\frac{\pi}{4}\right) \right] \cos(\omega t) \quad (2.17) \]

\[ \tilde{e}_{0,1} = \frac{2}{\pi} \left[ \sin\left(\frac{\pi}{4} + \pi \frac{\Delta t_1}{T}\right) \cos(\omega \Delta t_2) - \sin\left(\frac{\pi}{4}\right) \right] \sin(\omega t) \]

\[ + \frac{2}{\pi} \sin\left(\frac{\pi}{4} + \pi \frac{\Delta t_1}{T}\right) \sin(\omega \Delta t_2) \cos(\omega t) \quad (2.18) \]
\[ \bar{e}_{-1,0} = -\frac{2}{\pi} \left[ \sin \left( \frac{\pi}{4} + \frac{\Delta t_1}{T} \right) \cos(\omega \Delta t_2) - \sin \left( \frac{\pi}{4} \right) \right] \cos(\omega t) \]

\[ + \frac{2}{\pi} \sin \left( \frac{\pi}{4} + \frac{\Delta t_1}{T} \right) \sin(\omega \Delta t_2) \sin(\omega t) \] (2.19)

\[ \bar{e}_{0,-1} = -\frac{2}{\pi} \left[ \sin \left( \frac{\pi}{4} + \frac{\Delta t_1}{T} \right) \cos(\omega \Delta t_2) - \sin \left( \frac{\pi}{4} \right) \right] \sin(\omega t) \]

\[ - \frac{2}{\pi} \sin \left( \frac{\pi}{4} + \frac{\Delta t_1}{T} \right) \sin(\omega \Delta t_2) \cos(\omega t) \] (2.20)

And the 50% duty cycle clocks with 4 phases in figure 2.9(b) have the fundamental components shown in the below:

\[ \bar{U}_{1,1} = \frac{2}{\pi} \sin \left( \frac{\pi}{2} \right) \frac{1}{\sqrt{2}} \cos(\omega t) + \sin(\omega t) \] (2.21)

\[ \bar{U}_{-1,1} = \frac{2}{\pi} \sin \left( \frac{\pi}{2} \right) \frac{1}{\sqrt{2}} (- \cos(\omega t) + \sin(\omega t)) \] (2.22)

\[ \bar{U}_{-1,-1} = \frac{2}{\pi} \sin \left( \frac{\pi}{2} \right) \frac{1}{\sqrt{2}} (- \cos(\omega t) - \sin(\omega t)) \] (2.23)

\[ \bar{U}_{1,-1} = \frac{2}{\pi} \sin \left( \frac{\pi}{2} \right) \frac{1}{\sqrt{2}} \cos(\omega t) - \sin(\omega t) \] (2.24)

They can also be put in the same Cartesian coordinates system, shown in figure 2.11. The dashed lines are from 25% duty cycle clocks.

![Figure 2.11: The fundamental components of the 50% duty cycle clocks.](image)

Assuming the duty cycle mismatch is \( \Delta t_3 \), the delay mismatch is \( \Delta t_4 \), then the error vectors can be calculated:

\[ \bar{\epsilon}_{1,1} = \frac{\sqrt{2}}{\pi} \left[ \sin \left( \frac{\pi}{2} + \frac{\Delta t_3}{T} \right) \cos(\omega \Delta t_4) + \sin(\omega \Delta t_4) \right] \]

\[ \cdot \left[ \sin \left( \frac{\pi}{2} + \frac{\Delta t_3}{T} \right) \cos(\omega \Delta t_4) - \sin(\omega \Delta t_4) \right] \cos(\omega t) \]

\[ + \frac{\sqrt{2}}{\pi} \left[ \sin \left( \frac{\pi}{2} + \frac{\Delta t_3}{T} \right) \cos(\omega \Delta t_4) - \sin(\omega \Delta t_4) \right] \sin(\omega t) \] (2.25)
\[\begin{align*}
\vec{e}_{-1,1} &= \frac{\sqrt{2}}{\pi} \left[ \sin\left(\frac{\pi}{2} + \pi \frac{\Delta t_3}{T}\right) \right. \\
&\quad + \left. \frac{\sqrt{2}}{\pi} \left[ \sin\left(\frac{\pi}{2} + \pi \frac{\Delta t_3}{T}\right) \right. \\
&\quad \left. \left[- \cos(\omega \Delta t_4) + \sin(\omega \Delta t_4) \right] + \sin\left(\frac{\pi}{2}\right) \cos(\omega t) \right] (2.26) \\
\end{align*}\]

\[\begin{align*}
\vec{e}_{-1,-1} &= \frac{\sqrt{2}}{\pi} \left[ \sin\left(\frac{\pi}{2} + \pi \frac{\Delta t_3}{T}\right) \right. \\
&\quad + \left. \frac{\sqrt{2}}{\pi} \left[ \sin\left(\frac{\pi}{2} + \pi \frac{\Delta t_3}{T}\right) \right. \\
&\quad \left. \left[- \cos(\omega \Delta t_4) + \sin(\omega \Delta t_4) \right] \right] + \sin\left(\frac{\pi}{2}\right) \sin(\omega t) \right] (2.27) \\
\end{align*}\]

With the analysis above, it is clear that for the general expression of the error vector for both duty cycles:

\[\vec{e}_{i,j} = a_{i,j} \cos(\omega t) + b_{i,j} \sin(\omega t), \quad i, j \in (-1, 0, 1), \quad b_{1,0} = 0, \quad (2.29)\]

To explain how time mismatch affects the EVM, we first calculate mismatch the EVM of 25% duty cycle clocks. Assuming duty cycle mismatch \(\Delta t_1 = 1 \text{ ps}(\text{so about } 1\%),\) delay mismatch \(\Delta t_2 = 0\), then \(a_{0,1} = 0.00127, \quad b_{0,1} = 0, \quad V_{\text{ideal}} = \frac{\sqrt{2}}{\pi} \times 1\) (from Fourier Series)

\[EVM = \sqrt{a_{0,1}^2 + b_{0,1}^2} = \frac{\sqrt{2}}{\pi} \times 1 = 0.28\%(-51 \text{ dB}) \quad (2.30)\]

Using the same method, the EVM of 2 kinds of clocks with 1 ps mismatch can be calculated and is shown in table 2.1. This mismatch is either duty cycle mismatch or delay mismatch.

**Table 2.1:** The EVM of 2 kinds of duty cycle clocks with 1 ps mismatch

<table>
<thead>
<tr>
<th>Duty cycle</th>
<th>duty cycle mismatch</th>
<th>delay mismatch</th>
</tr>
</thead>
<tbody>
<tr>
<td>25%</td>
<td>0.28%(-51 dB)</td>
<td>0.57%(-45 dB)</td>
</tr>
<tr>
<td>50%</td>
<td>0.0004%(-108 dB)</td>
<td>0.57%(-45 dB)</td>
</tr>
</tbody>
</table>

It is clear from the table that the delay mismatch would have same effects on both duty cycle clocks, but if the same amount of timing mismatch is at duty cycle, then 50% duty cycle clock would be affected much less in terms of EVM. It can be understood in figure 2.12 [30]. The slope is very small in 50% duty cycle but quite large in 25% duty cycle. That means a certain \(\Delta t\) mismatch would affect 25% duty cycle clock much more.
It should be pointed out that in reality, many other non-perfect effects such as quantization noise, image, leakage, etc. are still dominant when considering the EVM performance. The calculation above simply provides an analytical way to compare these 2 clocks.

Moreover, the encoding scheme of PA cells can affect the peak efficiency that can be achieved. Deng, Zhiming et al.[17] did not used the 25% duty cycle LOs, but I/Q complementary encoding also brought a high efficiency. Furthermore, generation of 25% duty cycle clocks would require extra circuits and consume additional power. In conclusion, 50% duty cycle clocks are selected.

To sum up, the proposed digital quadrature transmitter in this project uses only 50% duty cycle LO. The architecture consists of a digital processor module, a Digital PLL and a sign bit selector module and a DPA, shown in figure 2.13.

In the architecture, the quadrature data and all control signals are generated from the FPGA. Then, the I and Q data are processed in the digital processor. For both I and Q data, the output of this module can be separated into 2 parts: sign bit and the rest bits. Both I and Q sign bits are used in the sign bit selector module to control the clock phases. The rest

![Figure 2.12: The fundamental component of a square wave vs the duty cycle.](image1)

![Figure 2.13: The proposed transmitter architecture overview.](image2)
bits represent the amplitude of the I or Q data and would directly be processed by the power amplifier.

On the other hand, the ADPLL generates differential sinusoidal signals based on corresponding modulation mode controlled by the FPGA. Then, these LO signals go into the sign bit selector module and are further processed. The output of the sign bit selector module would be 2 orthogonal square wave clock signals according to phases of I and Q.

Finally, the DPA combines IQ amplitude data and corresponding clock signals. As a matter of fact, the matching network is also needed for antenna loading but this part will be discussed in the next chapter.

2.2.2 Digital processor

The Digital processor is the first module that processes the input baseband I and Q data. The overview of this digital module is shown in figure 2.14. This digital module takes 11 bits IQ combined data as input, and the output would be separate I and Q data both in 7 bits. The input data from FPGA uses 2’s complement representation, however, the output would require the MSB as signature bit to control the phase and the rest bits as thermometer code to represent the amplitude.

Figure 2.14: The digital processor overview.

Since the input data from FPGA is IQ combined in a double data rate of 64 MHz, when they enter this digital module, they would be first separated by the data demux into one I branch and one Q branch both in 32 MHz. For each branch, in order to suppress the spectral sampling replica, the data would be up-sampled to $f_o/4$ (224 MHz) and sent to a low-pass FIR filter. Then, this filtered data is converted to a pattern that the MSB stays as the signature bit but the rest would be used as an unsigned integer to control the signal amplitude. This is done by a code converter. After this conversion, the sigma-delta modulator pushes the quantization noise towards high frequency and reduces the width of data to 7 bits. Furthermore, both of I and Q output data from their own modulator then enter a saturater, which limits the sum of amplitude of code I and amplitude of code Q to not larger than 63. This saturater only works when the sum of I and Q is too large, so for the most cases, the input and output would be exactly the same. In the end, for each 7 bits I and Q data, it is decoded by the decoder into a pattern that the sign bit represents the phase and the rest bits represent the amplitude as thermometer code.
2.2 Proposed architecture

2.2.3 Sign bit selector

The sign bit selector module is responsible for generating two clock signals at $f_o$ with 90° phase difference. The overview is shown figure 2.15.

![Sign bit selector diagram](image)

**Figure 2.15**: The sign bit selector module overview.

It first serves as a buffer for the differential 1.8 GHz LO signals from ADPLL and converts the sinusoidal signals to square waves. Then the divider divides the differential 1.8 GHz clock signals into 900 MHz signals of 4 phases: CK1: 0°, CK2: 90°, CK3: 180°, CK4: 270°. Based on the MSB of I data, which is the output of digital processor module, clock signal for I (or CKi) is selected from CK1 and CK3. Just like this, clock signal for Q (or CKq) is chosen from CK2 and CK4. Therefore, 2 MUXs are required to finish the selection. The design details of the divider and MUX will be discussed in chapter 4.

2.2.4 Retiming

After IQ amplitude signals and corresponding clock signals are generated, they are supposed to be combined and be converted into analog signals by the DPA. Moreover, this combination can be simplified by an AND operation. But before these signals enter each PA cell, one more thing should be considered. Ideally, the combined result should exactly have 50% duty cycle. However, since the generation and transition of IQ amplitude signals and their clock signals are totally uncorrelated, alignment is required. Otherwise glitches can exist after these signals combined. The glitches are square wave with non-perfect 50% duty cycle. Using I amplitude and CKi as an example, this effect can simply be shown in figure 2.16, the amplitude is used as cell enable signal. Since the amplitude data is represented by bits, the I_EN in this figure and afterwards is only 1 bit.

To align both of the CKi and CKq with the I and Q amplitude signals, a retiming clock($CK_{retime}$) is required. And the retiming mechanism is shown in figure 2.17.

For each PA on-cell, it needs to choose between CKi and CKq. Therefore the retiming mechanism needs to make sure that the combined result has 50% duty cycle no matter which clock signal is chosen. That means if CKi or CKq can be used as a reference to generate the retiming clock, it should have a lagging falling edge, in this case CKq.

Therefore, the retiming clock $CK_{retime}$ can be generated by doing an OR operation of CKi or CKq. Then, the falling edge of the $CK_{retime}$ triggers a D-flip-flop to retime the amplitude.
Figure 2.16: Glitches are generated if amplitude signal and clock signal are misaligned.

Figure 2.17: The retiming mechanism.
signal. Furthermore, the retimed amplitude is used to do the AND operation with corresponding clock and the required square wave with 50% duty cycle is generated. The block diagram of retiming can be shown in figure 2.18.

![Figure 2.18: The block diagram of retiming.](image)

### 2.2.5 PA stage

The PA used in this project is a switched-capacitor DPA. However, this section would not discuss circuit level details about PA design. Instead, general considerations would be introduced in this section.

#### 2.2.5.1 Resolution

The retimed amplitude signals and clock signals would enter the PA stage. The PA used in this project is a bank consisting of several PA cells. Then, the amplitude modulation can be simplified as a process of changing the number of active PA cells. This is why the power amplifier in a transmitter is often called as RF-DAC in many papers. For this digital to RF analog conversion, firstly, the number of total PA cells needs to be decided. Since the quantization noise power is:

\[
quantization\ noise = \frac{V_{LSB}^2}{12} \quad (2.31)
\]

Higher resolution would result in a smaller \(V_{LSB}\) and therefore a lower quantization noise. Assuming the number of bits is \(N\), the signal-to-quantization noise ratio (SNR) can be represented as:

\[
SNR = 1.76 + N \times 6.02 \text{ dB} \quad (2.32)
\]
Because the target bandwidth is 8 MHz and the sampling frequency is 224 MHz, an increase of $\frac{10 \log (f_s^2 \times BW)}{2} = 11.5 \text{ dB}$ should be considered.

In this project, 6 bits resolution is used to maintain low quantization noise. In this case, the SNR is $1.76 + 6 \times 6.02 + 11.5 = 49.6 \text{ dB}$, which is high enough for the 40 dB rejection ratio required by the spectral mask.

### 2.2.5.2 Encoding consideration

After the resolution is chosen, a suitable encoding representation needs to be selected. Because different ways of encoding bring various advantages and disadvantages. For a digital-to-analog converter, the differential nonlinearity (DNL) is an important figure of merit. It represents the deviation of each step with respect to the ideal LSB size. The mathematical formulation can be shown in eq. 2.33.

$$DNL = \frac{V(i+1) - V(i)}{V_{LSB}} \quad (2.33)$$

For a binary weighted DAC, it can be shown in eq. 2.34 that the DNL would increase if resolution is higher. $\sigma_\epsilon$ is the standard deviation of component mismatch of each cell, $B$ is the number of bits.

$$\sigma_{DNL}^2 = (2^B - 1)\sigma_\epsilon^2 \quad (2.34)$$

Therefore, if 6 bits resolution, which means 63 cells, are totally encoded using binary, the DNL would be inferior due to many bit changes during large digital code transition, i.e. from 011111 to 100000. On the other hand, if 6 bits are totally represented by unary representation (or thermometer code), a decent DNL can be achieved but the routing complexity and buffer power consumption would significantly increase. Therefore, a trade-off needs to be made. Figure 2.19 shows several combinations of $x$ bits encoded in binary and $(6 - x)$ bits encoded in unary, $0 \leq x \leq 6$. In this figure, each little square represents a PA unit cell. In this case, the PA can be separated into a binary bank and a unary bank. In this figure, each little square represents a PA cell, a red square means PA cell in binary bank, a blue square means PA cell in unary bank.

In this work, $x$ is chosen to be 3. So for total 6 bits resolution, the lower 3 bits are encoded using binary: 0, 1, 2, so total 7 cells are used. The higher 3 bits are represented in unary: 1, 2, 3, 4, 5, 6, 7 and each unary bit represents 8 cells, so 56 cells in total are used.

### 2.2.5.3 IQ-sharing

For a quadrature transmitter, the straightforward approach to arrange the power amplifier is to individually separate the PA into 2 sub-PAs for I and Q branch. Because the power amplifier employed in this project is the digital-controlled PA, this approach would require both I and Q branch to modulate a certain amount of PA unit cells and then combine them, for instance, in the current domain. This can be shown in figure 2.20. In this figure, each little square represents a PA unit cell and I and Q branch employ a PA bank individually.
2.2 Proposed architecture

Figure 2.19: Different combination of $x$ bits encoded in binary and $(6-x)$ bits encoded in unary.

Figure 2.20: Separate PA banks in the quadrature transmitter.

This approach at least has 2 drawbacks. The first one is about linearity. Because the power amplifiers for I and Q branches are totally different banks, the mismatch between these 2 banks would cause mismatch between I and Q signals. In the end, the EVM would be affected. The other one is that the maximum output power would 3 dB lower than in the polar architecture. To elaborate this effect, assume the maximum output voltage contributed by each bank is $V_o$, then the combined output voltage is $\sqrt{2}V_o$ because the I signal and Q signal have a 90° phase difference. However, the output voltage contributed by all the PA cells in a polar architecture can be combined because there are no phase difference of these signals. So the maximum combined output voltage would be $2V_o$. Therefore, the output of this approach would be 3 dB lower.

Another approach, proposed in this report, called IQ-sharing, can provide the maximum output power which is equal to the one in the polar architecture. This IQ-sharing method can be shown in figure 2.21.

In this approach, only 1 PA banks is used. As shown in the figure, the cells enabled by I signal
start from one end, the cells enabled by Q signal start from the opposite end. The maximum output power would happen in the case that all the cells are enabled by I signal or Q signal. In this case, the maximum combined output voltage would be $2V_o$. Compared to the first approach, IQ-sharing brings higher maximum output power but PA’s power consumption does not change, so the maximum efficiency is also higher.

Furthermore, in this case, the mismatch problem in this approach can be somehow alleviated because I and Q signal would share the PA cells at some cases. For example, at peak output power, either all cells are enabled with $CK_i$ or with $CK_q$, which means the 2 signals experience the same mismatch by sharing cells. This is why it is named IQ-sharing.

### 2.2.5.4 Constellation consideration

The constellation points of this 6 bits IQ sharing architecture can achieve is shown in figure 2.22.

Take the point $(I, Q) = (3, 4)$ as an example, the 0th and 1st binary bit would be used to generate $I$, the 2nd binary bit would be used to generate $Q$. Therefore this point can be generated successfully with current encoding method.

However, there are some points that cannot be represented by this way of encoding. For example, for the point $(I, Q) = (3, 1)$, $I = 3$ would require the 0th and 1st binary bit but
Q = 1 would also need the 0th binary bit. However, one bit, which means a bunch of cells, cannot be shared by both I and Q at the same time because two 50% duty cycle square waves with 90° phase difference will certainly have a part overlapped in the time domain. To solve this problem, 2 same binary banks are used in this project. These 2 binary banks both represent the 3 lower bits but one bank is only for I, the other is only for Q. In this way, every constellation point in figure 2.22 can be generated. So the total number of cells is 70. The final PA cells arrangement are show in figure 2.23.

![Figure 2.23: The final arrangement of PA cells.](image)

However, one main drawback of one extra binary bank is that it could bring output power decrease. Although circuit level details of employed PA has not been introduced yet, the output power can still be explained. The output amplitude is proportional to the ratio of the number of active PA cells and the number of total PA cells. For example, if there are n active cells, \( x \) bits encoded in binary and \( (6 - x) \) bits encoded in unary. The number of total PA cells would be \( N = 63 + 2^x - 1 \). Then the output power can be shown as below:

\[
P_{\text{out}} \propto \left( \frac{n}{N} \right)^2 = \left( \frac{n}{63 + 2^x - 1} \right)^2
\]

It can be seen that the more binary bits are, the lower output power this PA could provide. With \( x = 3 \), the peak output power loss is:

\[
P_{\text{loss}} = 10 \log_{10} \left( \frac{1}{\left( \frac{63}{63+2^x-1} \right)^2} \right) = 0.92 \text{ dB}
\]
The peak and average efficiency could also be worse but this drop is small because several energy loss would also be decreased. This would be discussed more in the next chapter.

A general working case can be shown in figure 2.24. In this case, \((I, Q) = (13, 27)\). So 

\[
I = (8 \times 1) + (1 + 4), \quad Q = (8 \times 3) + (1 + 2).
\]

![Figure 2.24: A general case of the PA: \((I, Q) = (13, 27)\).](image)

2.3 Conclusion

In this section, several the state-of-art digital transmitter architectures are first reviewed. The quadrature IQ-sharing architecture is proposed for this project. For the proposed architecture, first, the clock duty cycle is considered. Then, each module gets briefly introduced. For the power amplifier, important features like encoding are discussed. IQ-sharing is the primary technique used to improve the efficiency of a quadrature transmitter.
Chapter 3

Digital power amplifier design

The power amplifier is the module that provides impedance matching for the antenna and brings the output power to the required level in an efficient way. As discussed in preceding chapters, this transmitter design uses a digital power amplifier (DPA). In this chapter, first, the consideration of PA design will be introduced. Then, a comparison between traditional PA and DPA is given. Moreover, several state-of-art DPA architectures would be discussed and their advantages and disadvantages will be explored. Moreover, the design and optimization of the DPA utilized in this project will be presented in detail. The next part will be the sign bit selector, which is a module related to PA, and the design details will be discussed. In the end, a summary of the designed PA stage would be drawn.

3.1 General considerations

3.1.1 Output power, efficiency and linearity

The main job of a PA is to deliver the required power to the antenna in an efficient manner. The output power of a PA can be defined as the RF power delivered to the load within the band of interest. For a sinusoidal signal, the output power can be expressed as:

\[ P_{out} = \frac{V_{out}^2}{2R_L} \]  

(3.1)

\( V_{out} \) is the output voltage on the load and \( R_L \) is the load resistance. For a traditional PA, there is usually a trade-off between output power and voltage stress of the transistor. Because it is quite normal that the drain voltage of the transistor reaches \( 2V_{DD} \), or ever higher. As CMOS technology scales down, the supply voltage also decreases. This brings a new problem for an RF PA because it leads to lower output power. To produce the same output power, a large impedance transformation ratio is required. But this comes at the cost of more resistive loss of the impedance transformation network, which lowers the efficiency.
The drain efficiency is defined as:

\[
\eta = \frac{P_{out}}{P_{DC}}
\]  

(3.2)

\(P_{out}\) is the output power and \(P_{DC}\) is the DC power consumption.

The linearity can be evaluated by several metrics. In this project, INL is used instead of traditional IIP3 or OIP3 due to the fact that the DPA converts digital code into RF analog signal. In other words, the INL is a superior way to evaluate this digital-to-analog conversion process. The INL can be defined as:

\[
INL = \frac{V(i) - i \times V_{LSB}}{V_{LSB}}, \forall i = 0, 1, \ldots \left(2^{\text{number of bits}} - 1\right)
\]  

(3.3)

The output spectrum can also be used to show the nonlinearity. The spectral mask sets a limit for the power density of the wanted and all the unwanted spurious signals.

### 3.1.2 Current-mode PAs and switch-mode PAs

Traditionally, the power amplifier can be categorized into several classes. Class A, AB, B and C can be considered as a current source operation. This kind of PA can be shown in a general structure in figure 3.1.

![Figure 3.1: The general structure of a current source PA.](image)

Moreover, the difference of class A, AB, B and C PAs lies in their biasing condition. The different biasing point brings different conduction angle of the output transistor so that the drain voltage and the drain current would be different which leads to a different efficiency. The drain voltage and drain current waveforms can be shown in figure 3.2.

The DC power consumption is proportional to the the overlapping area of the drain voltage and the drain current. So smaller conduction angle means smaller DC power. At the same time, they generate the same maximum output power:
3.1 General considerations

Figure 3.2: The drain voltage and drain current of current-mode PAs at full drive.

\[ P_{out} = \frac{V_{max} I_{max}}{8} \]  

So the efficiency is reversely proportional to the conduction angle. Meanwhile, the larger conduction angle would cause the PA to work linearly in a larger input and output range. These comparisons can be summarized in table 3.1.

Table 3.1: The comparison of current-mode PAs

<table>
<thead>
<tr>
<th>Class</th>
<th>A</th>
<th>AB</th>
<th>B</th>
<th>C</th>
</tr>
</thead>
<tbody>
<tr>
<td>Conduction Angle</td>
<td>$2\pi$</td>
<td>$2\pi \sim \pi$</td>
<td>$\pi$</td>
<td>$\pi \sim 0$</td>
</tr>
<tr>
<td>Efficiency</td>
<td>(\leq 50%)</td>
<td>(\leq 78%)</td>
<td>(\leq 78%)</td>
<td>(\leq 100%)</td>
</tr>
<tr>
<td>Linearity</td>
<td>Well</td>
<td>Good</td>
<td>Moderate</td>
<td>Poor</td>
</tr>
</tbody>
</table>

Although class-C PA can theoretically offer 100% efficiency, output power delivered to the load in that case would be zero. So, choosing between these current-mode PAs is always a trade-off among output power, efficiency and linearity.

At the same time, a conclusion can be made that to increase the efficiency of a power amplifier, the key is to decrease the overlapping time in which the output transistor carries a large current and a large voltage. Additionally, it can be observed from figure 3.2 that the current and voltage of these PAs are all sinusoidal (or parts of sinusoidal). If the waveform can be changed, for example, to square wave, then less overlapping can be achieved and, hence, the efficiency can be improved.

A special current-mode PA, class-F PA, is based on this principle. The schematic is shown in figure 3.3.

Class-F PAs uses a LC tank resonating at twice or three times the fundamental frequency to create a high impedance for the second or third harmonic. So the edges of \(V_X\) would be sharper than a sinusoidal signal and the power loss can be reduced. Transmission line of a quarter-wavelength can also be used to replace the resonator to generate more high order
harmonics. In practice, however, to control more than 3 harmonics would be hard and leads to much more narrow-band PA. Another drawback would be the bulkiness which arises the resonator or transmission line.

On the other hand, switch-mode PAs can also benefit from small overlapping of the drain voltage and the drain current. So, theoretically their efficiency can also achieve 100% efficiency. Instead of operating as a current source, the transistors in switch-mode PAs work as a switch. Two well-known switch-mode PAs are class-E and class-D PAs.

A class-E power amplifier is shown in figure 3.4. The output transistor $M_1$ works as a switch rather than a voltage-controlled current source. If we ignore the transition time, when the switch is on, the transistor works in the triode region, thus $V_X$ is low, $I_D$ is high. When the switch is off, the transistor is in cut-off region, $V_X$ is high, $I_D$ is zero.

Although it seems that the overlapping region is small, three conditions [10] need to be satisfied to minimize the transistor power loss: (1) When the transistor starts to turn off, $I_D$ drops, but $V_X$ needs to rise slowly. This can be guaranteed by the $C_1$; (2) When the transistor starts to turn on, $V_X$ should have already dropped to zero; (3) When the transistor starts to turn on, the derivative of $V_X$ should also close to zero. These conditions are shown in figure 3.5 [10].

To satisfied the second and third conditions, the second-order damping system, composed of the load network, should be designed in critically-damped condition. It should be mentioned
3.1 General considerations

(a) condition (1)  
(b) condition (2) and (3)  

Figure 3.5: Three conditions are required for class-E PAs

that although the third condition makes solving component values of the damping system easier, applying this condition also makes the class-E PA more sensitive to component variations. Thus, alternatives without being applied this condition also exist.

Another advantage of class-E PA is that the drain parasitic capacitance can be absorbed in $C_1$. The main problem of class-E power amplifiers is that the peak voltage at $V_X$ is too large, exceeding 3.59 time $V_{DD}$, which requires technique like cascode structure, otherwise, the transistor might experience a reliability issue.

The class-D PA also uses a square wave as input but has two transistors as switches, as depicted in figure 3.6, the whole active part operates as an inverter.

Figure 3.6: The schematic of a class-D PA

The drain voltage and current waveforms are shown in figure 3.7. As shown in the figure, there is not any overlapping between the drain voltage and current, theoretically, 100% efficiency can also be achieved.

Because the power amplifier works as an inverter, peak-to-peak voltage at $V_X$ is just $V_{DD}$. Compared to class-E PA, the transistor does not suffer from the reliability problem. However, the major drawback is that delivering a high output power would be hard unless the matching network can offer a large impedance transformation ratio. But the power loss of the passive components would also be large. Compared to class-E power amplifier, class-D PA also requires a PMOS transistor. To achieve a similar on-resistance to the NMOS transistor, the PMOS switch required more than 2 times the width(or size), which means larger the input
capacitance and larger driving power. Another drawback is its matching network cannot absorb the drain parasitic capacitance like class-E PAs, which would cost power loss as $CV^2f$.

To summarize, due to the switch-mode operation, these PAs can potentially achieve higher efficiency compared to the current-mode PAs. With scaled CMOS technology, the minimum feature size and supply voltage get scaled down, which make the design of PA design more complicated. However, MOSFETs switches can operate faster and consume less dynamic power. So, switch-mode PAs are more suitable for the advanced CMOS technology.

### 3.1.3 Single-ended and differential PAs

The power amplifier design must consider the package parasitics. For an on-chip PA, the $V_{DD}$ and $GND$ need to be connected with the external supply and ground using bonding wire, which acts as an inductor.

There are two major issues related to the bonding wire, namely. First, it is related to voltage drop across, so, the effective supply voltage is actually lower. Assume the current going through the bonding wire is:

$$I = I_0 \cos \omega_0 t + I_1$$  \hspace{1cm} (3.5)

This voltage drop can be shown as:

$$V = L \frac{dI}{dt} \propto L \omega_0 I_0$$  \hspace{1cm} (3.6)
This issue can be alleviated by decrease the current because the inductance coming from the bonding wire and operating frequency are not easy to change.

The second issue is that due to the bonding inductance, the high current of the power amplifier can go back to the preceding circuits through the on-chip $V_{DD}$ line and cause high frequency ripple [10]. This can be shown in figure 3.8.

![Figure 3.8: The effect of current traveling back due to bonding wire.](image)

These two issues can both be mitigated by a differential architecture. In an ideal situation where the signal envelope is constant (class-A operation), when one PA decreases the current drawn from on-chip $V_{DD}$, the other PA increases the current at the same time, as shown in 3.9. So the current effectively goes from one PA to the other, then the current going through the bonding wire can be greatly decreased. If the PA does not operate in class-A, the differential approach only alleviates these issues to some extent.

![Figure 3.9: The problems are alleviated in a differential PA.](image)

The drawback of a differential architecture is that compared to a single-ended PA, it requires
a balun to drive a single-ended antenna, which brings extra loss.

### 3.2 State-of-the-art DPA architectures

The digital power amplifier, or DPA, is usually composed of several PA unit cells. Its input is a digital code. Note that the DPA operates as a power-DAC. That is why it is also often referred as RF-DAC or DRAC. In this section, two main DPA architectures [31] are introduced: current source based DPA and switch-based DPA.

#### 3.2.1 Current source based DPA

The current source based DPA sums the current of several PA unit cells and each PA cell acts as a current source. Figure 3.10 shows an example of this kind of DPA. The amplitude control word (ACW) would be used to enable or disable the current source transistor and the other transistor is controlled by the phase signal.

![An example of the current mode DPA](image)

Kavousian, Amirpouya, et al. [32] implemented this kind of power amplifier. In this work, each cell performs as a class-A power amplifier. The average efficiency is improved compared to traditional class-A power amplifier due to the adapted polar architecture. Using 0.18 μm technology, they achieved an average PAE of 6.7% with 13.6 dBm output power. This efficiency is still too low compared to others because of the class-A operation.

For this kind of PA, large output impedance is required. Usually the cascode technique is used and this would increase the required voltage headroom. A major advantage is the peak-to-peak voltage of $V_{RF}$ can be as high as $2V_{DD}$, so high output power can be easily delivered. Another advantage is that, theoretically, the output current is linearly proportional to the number of active current sources. The main drawback is low efficiency.
3.2.2 Switch-based DPA

In the switch-based DPA, each unit cell performs as a switched-mode PA. The general structure can be shown in figure 3.11. A switched-mode PA splits into several unit cells, and their related drain nodes are electrically connected. These unit cells are digitally controlled by ACW.

![Figure 3.11: An example of the switch-based DPA](image)

Because each transistor acts as a switch, it can be modeled as a resistor, $R_{on}$, when it is on and open when it is off. So, when ACW gets larger, more cells are activated and the total resistance becomes smaller, which increase the output current, or output power.

Staszewski, et al. [11] implemented this structure with class-E amplifier and achieved a fully digital polar transmitter in 130 nm technology. Deng, et al. [17] used class-D power stage for each cell to implement a digital quadrature transmitter. Moreover, a 3.3 V supply was used for large output power and a double-cascode structure was employed for addressing reliability.

Because they are switched-mode PAs, their efficiency is high. The main drawback is poor linearity. This kind of DPA has code dependent output impedance, which leads to AM-AM and AM-PM distortion. So, calibration technique like digital pre-distortion (DPD) is usually needed, which costs extra power and area.

Another popular switch-based DPA structure is the switched-capacitor power amplifier (SCPA) [12], the details of which will be discussed in the next section.

3.3 DPA design and optimization

3.3.1 The principle of SCPA

The SCPA combines the concept of DPA and the technique of switched-capacitor circuit. Just like a normal DPA, the SCPA is composed of lots of unit PA cells. Additionally, each cell operates in class-D. That means each transistor works as a switch and the whole active part performs as an inverter. Moreover, the switched-capacitor technique exploits the capacitor,
Digital power amplifier design

which is natively precise and area-efficient in CMOS technology. The block diagram of a single-ended SCPA can be shown in figure 3.12.

![Figure 3.12: The block diagram of a single-ended SCPA](image)

The digital code is used to enable or disable PA cells and the number of enabled cells is proportional to the output amplitude. For those enabled cells, the inverter passes a square wave to the left plate of capacitor varying from $V_{DD}$ to $GND$ at RF frequency. For those disabled cells, the inverter passes a constant $GND$ to the left plate of capacitor. So this PA can be modeled as capacitors with different input voltage sources in parallel with a matching network, which can be shown in figure 3.13.

![Figure 3.13: The simplified model of a single-ended SCPA](image)

In this model, the enabled cells are driven with a square wave source switching from $V_{DD}$ to $GND$ and other cells are connected to ground, which in together form a capacitive voltage divider. Assuming $n$ cells are enabled out of total $N$ cells, the voltage at the top plate terminal $X$ is:

$$V_X = \frac{n}{N} \cdot V_{DD}$$  \hspace{1cm} (3.7)

The linearity of SCPA is superior compared to other DPAs. From the above equation, it shows that the digital code information is directly represented as a voltage ratio, this ratio is very accurate because it utilizes the accurate capacitor ratio, which is a native advantage of CMOS technology. In addition, it avoids using current sources, which is more difficult to design. At the same time, compared to other switch-based DPAs, output impedance of
SCPA is constant, which is composed of all the capacitance in parallel (the on-resistance of the switch in this ideal case is ignored). The Thevenin equivalent circuit is depicted in figure 3.14.

![Thevenin equivalent circuit of a single-ended SCPA](image)

**Figure 3.14:** The Thevenin equivalent circuit of a single-ended SCPA

Note that the $R_{opt}$ used in this model represents a theoretical resistor, which is transformed by the matching network from the real 50 Ω load. Furthermore, the series inductor $L$ is used to resonate with the summed capacitance $NC$. The resistance of this $R_{opt}$ is much lower than 50 Ω so that desired power level can be achieved. If the loss of this part of matching network can be ignored, the output power delivered to this theoretical resistor would approximate the power on the real load.

In this case, the voltage across the $R_{opt}$ is considered as $V_{out}$. Because the matching network also serves as a bandpass filter, only fundamental component at RF carrier frequency would show on this resistor:

$$V_{out} = \frac{2}{\pi} \left(\frac{n}{N}\right) V_{DD}$$

(3.8)

So the output power is:

$$P_{out} = \frac{2}{\pi^2} \left(\frac{n}{N}\right)^2 \frac{V_{DD}^2}{R_{opt}}$$

(3.9)

The only power loss in this ideal case would be charging and discharging the capacitive voltage dividing network $P_{SC}$:

$$P_{SC} = C_{in} V_{DD}^2 f$$

(3.10)

where $C_{in}$ is the input capacitance of the capacitive voltage divider. If the switching edges of the square wave are sharp enough, inductor $L$ can be considered as a current source, which behaves as an open circuit. So, $C_{in}$ is composed of 2 capacitor-group in series. The first group is all the capacitors from enabled cells in parallel, the second group is composed of capacitors from disabled cells in parallel. This can be show in figure 3.15.

So, $C_{in}$ can be expressed as:

$$C_{in} = \frac{n(N - n)}{N} C$$

(3.11)
And the drain efficiency in this ideal case is:

\[ \eta = \frac{P_{\text{out}}}{P_{\text{out}} + P_{\text{SC}}} \]  

(3.12)

### 3.3.2 SCPA design and optimization

In the practical case, more losses should be considered, which affect the output power and efficiency:

- The on-resistance of the transistor switch.
- The driver power consumption.
- The matching network includes inductors have limited quality factor.
- Bonding wire can be modeled as an inductor and has limited quality factor.
- Charging and discharging capacitors.

To include these losses, a more straightforward model is shown in figure 3.16.

The \( R_{\text{on}} \) represents the total on-resistance of all unit cells. \( C_1 \) is the total capacitances, which is \( NC \). The matching network is a “T-matching” network consisting of two inductors \( L_1, L_2 \) and one capacitor \( C_2 \), shunt to ground. Moreover, the left inductor \( L_1 \) can be divided into 2 parts: \( L_3 \) and \( L_4 \). \( L_3 \) is used to resonate with \( C_1 \). \( L_4, L_2 \) and \( C_2 \) form a sub-“T-matching” network, the purpose of which is to transform \( R_L \) to \( R_{\text{opt}} \). For simplicity, the bonding wire, which connects \( C_1 \) and \( L_1 \), is not shown in this figure. It should be mentioned that although
the driver power consumption is not drawn in figure 3.16, it is actually included in this model by considering $R_{on}$ because the input capacitance of the PA is inversely proportional to the on-resistance $R_{on}$.

To show all the losses more clearly, this model can further be processed. Moreover, if we combine $R_{on}$ and equivalent serious resistance of $L_1$ and boding wire as $R_{loss1}$, equivalent serious resistance of $L_2$ as $R_{loss2}$, a model of SCPA showing all the losses is obtained. This model can be used for practical design and optimization, which is shown in figure 3.17.

![Figure 3.17: A model of SCPA showing all the losses.](image)

It should also be noted that in further design and optimization, although the driver power consumption is not drawn in figure 3.17, it is included in this model by having a relationship with $R_{loss1}$.

The output power based on this model can be shown in the equation below:

$$P_{out} = \left( \frac{2}{\pi} \cdot \frac{n}{N} \cdot \frac{V_{DD}}{\sqrt{2}} \right)^2 \cdot \frac{1}{R_{opt} + R_{loss1}} \cdot \frac{R_{opt}}{R_{opt} + R_{loss1}} \cdot \frac{R_L}{R_L + R_{loss2}} \quad (3.13)$$

If the quality factor of the sub-“T-matching” network including $L_1$, $L_2$ and $C_2$ is fixed at a reasonable value, for example 3 [12], and all the inductors in the matching network have a quality factor of 30, then the 3 observations can be made:

1. The value of $L_4$, $L_2$ and $C_2$ are all a function of $R_{opt}$.
2. $R_{loss2} = f(R_{opt})$.
3. $R_{loss1} = f(R_{opt}, R_{on}, C_1)$, bond wire loss is already known.

In conclusion,

$$P_{out} = f(R_{opt}, R_{on}, C_1, n) \quad (3.14)$$

The next step is to find suitable values for all foregoing components.

The first design target is to achieve continuous peak output power higher than 7 dBm. So, the design starts at the peak output power situation. In this case, $n = N$ and $R_{on}$ is set to be a reasonable number, for instance 2 $\Omega$. Why 2 $\Omega$ is a suitable value? It can be explained in the later discussion. Moreover, actually $n$ should not be exactly equal to $N$, which would also be explained later. But, for the time being, we will continue with aforementioned assumption.
First, the relationship among $P_{out}$, $R_{opt}$ and $C_1$ is evaluated. Figure 3.18 shows how the peak output power is influenced by $R_{opt}$ and $C_1$.

This figure continues the following: When $C_1$ is fixed, the highest output power can be achieved when $R_{opt}$ is equal to the “source” impedance $R_{on}$, which is 2 Ω. When $R_{opt}$ is fixed, larger $C_1$ leads to larger output power. Moreover, $C_1$ and $L_3$ resonate out each other, so, larger $C_1$ means smaller $L_3$, ergo smaller $R_{loss1}$ while $R_{loss2}$ is fixed (see observation 2).

Because the target is 7 dBm, a plane of 7 dBm is used to distinguish the qualified points from the others. The dark blue points in the figure belongs to the plane of 7 dBm. One qualified point $(x, y, z) = (22, 4.8, 7.091)$ is marked as an example.

If the losses from the beginning of this section 3.3.2 is considered, the peak efficiency can also be calculated. It is shown in figure 3.19. Moreover, in the efficiency calculation, the power consumption of closest buffers driving the SCPA is also included.

This figure confirms that larger $R_{opt}$ leads to higher efficiency, and, larger $C_1$ means smaller $R_{loss1}$ and higher efficiency. Moreover, the 7 dBm plane is also depicted. Thus, the previous marked point turns out to have the highest peak efficiency 67.2%.

However, the peak efficiency is not the most critical specification. Modern communication standards, i.e., 802.11 ah, adapt complex-modulated baseband data, such as OFDM, in which its related average power is located 10 dB below its peak power (PAPR = 10 dB). This, in turns, contributes to inferior average system efficiency. As a result, this design parameter is more critical than the peak efficiency. Considering QPSK with OFDM, if enough data is transmitted, there should be an almost fixed probability distribution pattern of different output power, or number of enabled cells. This pattern is shown in figure 3.20.

So, for a certain point $(x, y) = (R_{opt}, C_1)$, the efficiency at each output power can be calculated just like at peak output power except for a different $n$. Based on the above probability distribution pattern, these efficiencies can be used to calculate an average efficiency of this point $(x, y) = (R_{opt}, C_1)$. The probability is treated as a weight.
3.3 DPA design and optimization

Figure 3.19: The peak efficiency of SCPA with different $R_{opt}$ and $C_1$.

Figure 3.20: The probability distribution pattern of QPSK with OFDM.
So, for all the points, a figure of average efficiency can be plotted, which is shown in figure 3.21.

Points belonging to the plane of 7 dBm are also shown. The shape resembles the peak efficiency. But the previous marked point does not bring the highest efficiency any more because the weight of peak output power is low.

The region of points bringing highest average efficiency can be easily found so this design and optimization is about to finish. The next step should be pick one suitable point to obtain the value of $R_{\text{opt}}$ and $C_1$.

But before doing that, recall 2 questions remain unsettled from beginning of the design. First, $n$ should not be exactly equal to $N$ at peak output power because the PA stage uses an additional binary bank, which is discussed in the end of last chapter or can be shown in figure 2.23. But this power loss can additionally be considered. Second, what is a suitable value for $R_{\text{on}}$? The output power can be evaluated based on different $R_{\text{on}}$. Figure 3.22 shows the peak output power with $R_{\text{on}} = 0.5$ Ω or 2.4 Ω. Points belonging to the plane of 7 dBm are also shown.

It appears that lower $R_{\text{on}}$ leads to higher output power. But the average efficiency still needs to be evaluated. Figure 3.23 shows the peak output power with $R_{\text{on}} = 0.5$ Ω or 2.4 Ω.

It can be observed that smaller $R_{\text{on}}$ would lead to lower average efficiency for the most points except when $R_{\text{opt}}$ is very small. But for a “T-matching” impedance transformation network like $L_4$, $L_2$ and $C_2$, it can be proven that $C_2$ is reversely proportional to the impedance transformation ratio. It means a very low $R_{\text{opt}}$ would require a very high $C_2$. So, on the surface of $R_{\text{on}} = 0.5$ Ω, points that have high efficiency but require very low $R_{\text{opt}}$ are not feasible. Therefore, $R_{\text{on}} = 0.5$ Ω is not a suitable option.

When $R_{\text{on}}$ becomes higher, high average efficiency is difficult to obtain. Although it is not shown here, an $R_{\text{on}} > 7$ Ω can no longer provide the peak output power larger than 7 dBm. So, in this project, $R_{\text{on}}$ is finally set to be 2.4 Ω considering all the foregoing limitations.
3.3 DPA design and optimization

Figure 3.22: The peak output power of SCPA with different $R_{on}$.

Figure 3.23: The average efficiency of SCPA with different $R_{on}$.
The next step is to choose one point from the surface of \( R_{on} = 2.4 \, \Omega \). In addition to the average efficiency, practical component values along ESD limit the design options. The final selected point is \( (R_{opt}, C_1) = (13 \, \Omega, 3.36 \, pF) \). The details will be discussed below.

All the passive components should have reasonable values. \( L_1, L_2 \) and \( C_2 \) are simulated with \( R_{on} = 2.4 \, \Omega \), which are shown in figure 3.24. At the selected point \( (R_{opt}, C_1) = (13 \, \Omega, 3.36 \, pF) \), component values are reasonable.

![Figure 3.24: \( L_1, L_2 \) and \( C_2 \) values.](image)

For the ESD model shown in figure 3.25, the \( V_{pad} \) should be \(-0.7V < V_{pad} < 4V\).

![Figure 3.25: The SCPA with ESD model.](image)

Furthermore, the voltage swing at \( V_{pad} \) is calculated and shown in figure 3.26. It can be seen that the voltage swing meets the requirement at point \( (R_{opt}, C_1) = (13 \, \Omega, 3.36 \, pF) \).

In summary, for a single-ended SCPA, the optimized component parameters in figure 3.16 are: \( R_{on} = 2.4 \, \Omega, C_1 = 3.36 \, pF, L_1 = 13.4 \, nH, L_2 = 11.18 \, nH, C_2 = 5.802 \, pF \). In this case, the optimized resistance is \( R_{opt} = 13 \, \Omega \). The peak output power is 8.4 dBm while the efficiency is 62.5%. The average efficiency is 36.52%. When considering the power loss introduced by the extra PA binary bank, the peak output power is around 7.48 dBm. Thus, the peak and average efficiency are both lower. It should be noted that the output parasitic capacitance of the PA is not included in the optimization model because it is much smaller than \( C_1 \).
3.4 Sign bit selector

Based on figure 2.13, the DPA has to handle 4 input signals: \( CK_i \), \( CK_q \), I\_EN and Q\_EN. From figure 2.18, the digital code signals I\_EN and Q\_EN would be retimed. However, correct phases still need to be selected for clock signals: \( CK_i \) and \( CK_q \).

This is done by the module called “Sign bit selector”. Here the overview of this module is replotted for analysis.

![Sign bit selector module overview](image)

3.4.1 Divider design

The divider is used to divide 1.8 GHz differential clock signals, \( CK \) and \( CKB \), into 4 phases of 900 MHz signals: \( CK1 \), \( CK2 \), \( CK3 \) and \( CK4 \). The input and output are all square wave. The divider is shown in figure 3.28.
This divider is separated into 2 parts by a dash line. The first part is the main divider based on the ring oscillator which generates a 900 MHz signal. The second part operates as a clock retimer which generates 4 phases.

The main divider is modified from this work [33]. It combines the large locking range from dynamic logic dividers and low power from injection-locked dividers. To illustrate how this part operates, its related signal waveforms are shown in figure 3.29(a).

The second part is a clock retimer, composed of 5 clocked-CMOS ($C^2$MOS [34]) latches. The first four latches produce 4 phases and the last one is a dummy latch, which provide the 4th latch with a load similar to other ones. Its waveform is shown in figure 3.29(b).

Other frequency divider designs might also work but the loop structure based on 2 D-flip-flops, shown in figure 3.30(a), either bring big delay mismatch such as TSPC [34], shown in figure 3.30(b), or require additional start-up circuit such as using transmission-gate shown in figure 3.30(c). So, they should be avoided.

### 3.4.2 MUX design

MUXes are used to select $CK_i$ from CK1 and CK3 and $CK_q$ from CK2 and CK4 for the power amplifier. Simultaneously, complementary phases of $CK_i$ and $CK_q$ are also used for differential purpose. So in total 4 MUXes are required.
3.5 Conclusion

In this chapter, several design considerations of PA are first introduced. Because this project focuses on the digital power amplifier, several state-of-the-art DPA architectures are reviewed. In this project, a switched-capacitor power amplifier (SCPA) is utilized. This chapter then illustrates the principle behind the SCPA. The whole design process and optimization of the SCPA are discussed in detail. In the end, the divider and MUX design in the sign bit selector are introduced.

![Diagram of other frequency divider designs](image1)

**Figure 3.30:** Other frequency divider designs.

This MUX is also based on clocked-CMOS structure [34], so layout with divider can be convenient. One MUX schematic is shown in figure 3.31 to illustrate how $CK_i$ is selected for the positive branch. The control signal $Ctrl$ is $I_{MSB}$, $CtrlB$ is the reversed signal generated from an inverter.

![MUX schematic](image2)

**Figure 3.31:** The MUX schematic.

To generate $CK_i\_N$, the same structure is used except with exchanged inputs. In this way, $CK_q\_P$ and $CK_q\_N$ can also be generated with different input and control signals.

**3.5 Conclusion**

In this chapter, several design considerations of PA are first introduced. Because this project focuses on the digital power amplifier, several state-of-the-art DPA architectures are reviewed. In this project, a switched-capacitor power amplifier (SCPA) is utilized. This chapter then illustrates the principle behind the SCPA. The whole design process and optimization of the SCPA are discussed in detail. In the end, the divider and MUX design in the sign bit selector are introduced.
Chapter 4

Digital processor design

For a fully digital transmitter, the digital processor is an important part. It handles the input baseband I and Q data and generates the required digital code for the entire TX. In this project, first, it is all written in VHDL and eventually it is synthesized. The overview of this digital module is shown in figure 2.14. In this chapter, data demux and upsampling will first be introduced. Then code converter and saturater will briefly be discussed. Other modules like the sigma-delta modulator or decoder are mostly reused from previous work so they will not be discussed.

4.1 Data demux and upsampling

The data demux and upsampling are realized together as the first part in the digital processor, which is shown in figure 4.1.

The input data is an IQ combined in a double data rate of 64 MHz. The purpose of this module is to separate I and Q into two separate branches and upsample them to $f_o/4$ (around 224 MHz). The upsampling is used to suppress the spectral sampling replicas to meet the spectral mask.
The operation of this module can be illustrated by 3 steps:

1. The input data needs to be first synchronized with the $f_o/4$ clock. Because the input data is provided by FPGA but the $f_o/4$ clock comes from ADPLL, alignment is necessary. This operation is shown in figure 4.2(a). The first step is to synchronize the data to the higher clock domain, which is the actual clock domain. The input clock and input data coming from the FPGA are seen as signals, all are synchronized using two flip-flops on the 224MHz clock.

2. I and Q need to be selected from the aligned input data, shown in figure 4.2(b). Since I or Q each has a single data rate of 32 Mb/s, of which the period is 7 times of $f_o/4$ signal, a counter is used to count from 1 to 7 repeatedly. At a specific moment set by the constant I_capture, I signal can be selected from the aligned synchronized input. Q signal is the same with the constant Q_capture. One advantage of this design is that these 2 constants, coming from the FPGA, are programmable. In reality, the tune-ability can protect I and Q capturing at correct moment which mitigates the metastability issues.

3. Because I and Q are captured at different moments, alignment is needed, which is shown in figure 4.2(c). Just like step 2, a constant called “Sync” is used to set a moment based on the counter timing, at which I and Q are both valid outputs.

![Figure 4.2: The operation of data demux and upsampling.](image)

4.2 FIR filter

To further suppress the clock image and improve the spectral purity, a simple FIR filter is used. The number of taps for this FIR filter is 13 to achieve a trade-off between performance and power consumption. The impulse response is shown in figure 4.3.
4.3 Code converter

After the FIR filter, the format of I or Q signal is a 11-bit code in 2’s complement. Because the analog circuit treats the signature and amplitude totally separately, another code format is required, in which the MSB still stands for the signature but the rest of bits only represent the amplitude. That means, the only difference between a positive number and a negative number with same amplitude is signature bit. Then, a design of decoder can be convenient because it can only handles the amplitude part. In other words, it decodes positive numbers and negative numbers in the same way.

Some examples of conversion from 2’s complement to the required format is shown in table 4.1.

<table>
<thead>
<tr>
<th>Signed integer</th>
<th>2’s complement</th>
<th>required format</th>
</tr>
</thead>
<tbody>
<tr>
<td>3</td>
<td>00000000011</td>
<td>00000000011</td>
</tr>
<tr>
<td>2</td>
<td>00000000010</td>
<td>00000000010</td>
</tr>
<tr>
<td>1</td>
<td>00000000001</td>
<td>00000000001</td>
</tr>
<tr>
<td>0</td>
<td>00000000000</td>
<td>00000000000</td>
</tr>
<tr>
<td>-1</td>
<td>11111111111</td>
<td>10000000001</td>
</tr>
<tr>
<td>-2</td>
<td>11111111110</td>
<td>10000000010</td>
</tr>
<tr>
<td>-3</td>
<td>11111111101</td>
<td>10000000011</td>
</tr>
</tbody>
</table>

Table 4.1: Conversion from 2’s complement to the required format
4.4 Saturater

The saturater guarantees the sum of I and Q code is not larger than 63, which basically realizes a clipping effect. In this project, if the sum of I and Q code is larger than 63, the smaller code is maintained, the larger code would be 63 minus the original code. For example, if $I + Q > 63$, and $|I| < |Q|$, then I stays as $I$, Q is equal to $63 - I$.

This clipping method actually leads to inferior EVM because the correct code sometimes are chopped. This effect also can be seen in the output spectrum. Other clipping methods can be explored in the future work. For example, if the sum of I and Q code is larger than 63, they can be clipped at 63 with the ratio of I over Q staying the same. But this might result in more power consumption in the digital processor because multiplication may be inevitable.

4.5 Conclusion

This chapter briefly discussed the principles of the data demux, code converter and saturater of the digital processor. In the future, more innovative research in this module can be explored.
Chapter 5

Simulation results

The design of this transmitter has fully been described in previous chapters. In this chapter, important simulation results will be presented and discussed. First, the output power and efficiency of the power amplifier will be discussed. Then, the linearity of digital TX will be analyzed. Finally, a power breakdown of the whole transmitter will be given. Note that the simulation is performed while the bonding inductance is included.

5.1 Output power and efficiency

The output power and efficiency of the power amplifier are shown in figure 5.1.

![Output power versus the efficiency of the power amplifier at 900 MHz.](image)

The peak output power is 7.04 dBm and the peak PAE is 49.6%. With the probability distribution shown in figure 3.20, the average power and efficiency are -0.9 dBm and 23.0% respectively.
5.2 Linearity

To evaluate the linearity of the transmitter, amplitude and phase of the output voltage is first simulated and shown in figure 5.2. The upper figure shows the behavior of output voltage when Q is set to be 0 and I increases from 0 to 63. The lower one shows the output voltage when I is set to be 0 and Q increases from 0 to 63.

![Figure 5.2: Output voltage of the transmitter versus the digital code.](image)

The amplitudes of I and Q seem quite linear and the phases of I and Q are constant and experience 90° phase shift. Based on the amplitude, the INL can be calculated from eq. 3.3 and is shown in figure 5.3. The maximum INL is 0.53 LSB.

![Figure 5.3: INL of the power amplifier.](image)

A dynamic simulation is executed using with 8 MHz 64-QAM data packets. The far-out view
is shown in figure 5.4(a), the close-in view is shown in figure 5.4(b). The EVM is -28 dB(4%).

![Relative RF Output Spectrum](image)

(a) Far-out output spectrum

![Relative RF Output Spectrum](image)

(b) Close-in output spectrum

**Figure 5.4:** Output spectrum of the transmitter with 8 MHz 64-QAM data packets

The output spectrum passes the close-in spectral mask with approximately 5 dB margin. The far-out spectral mask is not fully passed primarily because the upsampling frequency is set to be $f_o/4$. As discussed in the last chapter, the output spectrum is heavily affected the clipping effect caused by the saturater. The close-in view of output spectrum without the saturater is shown in figure 5.5.

Without the saturater, the margin is at least 10 dB and the EVM is -35 dB. But the saturater is necessary to make the rest of circuits operate correctly. Therefore, future work regarding other clipping methods can be explored to improve the degradation caused by the saturater.

The 64-QAM constellation is shown in figure 5.6.
Figure 5.5: Close-in view of the output spectrum without the saturater.

Figure 5.6: The 64-QAM constellation.
5.3 Power breakdown

The power breakdown is shown in figure 5.7. The post-layout simulation is performed at average output power level. Compared to previous work [16], since this work reuses the same ADPLL and the digital processor achieves similar functions, the power consumption of these two modules from previous work [16] is directly used as an estimation to save simulation time. The most power-consuming part is the power amplifier.

![Power Breakdown Diagram]

Figure 5.7: The power breakdown.

5.4 Comparison with state-of-art works

Table 5.1 summarizes the performance of proposed fully-digital IQ-sharing transmitter and compares it with other state-of-the-art digital transmitters. It should be noted that the performance of this work is based on simulation while others are based on measurement.

5.5 Layout overview

The layout overview is shown in figure 5.8, large modules are marked.
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology</td>
<td>40 nm</td>
<td>40 nm</td>
<td>28 nm</td>
<td>40 nm</td>
</tr>
<tr>
<td>Supply Voltage</td>
<td>1 V</td>
<td>1 V</td>
<td>1.1 V</td>
<td>3.3 V</td>
</tr>
<tr>
<td>Architecture</td>
<td>Digital</td>
<td>Digital</td>
<td>Digital</td>
<td>Digital</td>
</tr>
<tr>
<td>Quadrature</td>
<td>Polarity</td>
<td>Quadrature</td>
<td>Quadrature</td>
<td>Quadrature</td>
</tr>
<tr>
<td>Modulation</td>
<td>OFDM(11ah)</td>
<td>OFDM(11ah)</td>
<td>LTE</td>
<td>802.11a/b/g/n</td>
</tr>
<tr>
<td>Carrier Frequency</td>
<td>900 MHz</td>
<td>900 MHz</td>
<td>800 MHz</td>
<td>2.4/5.5 GHz</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>8 MHz</td>
<td>1/2 MHz</td>
<td>10 MHz</td>
<td>40/80 MHz</td>
</tr>
<tr>
<td>Average Pout</td>
<td>2.1 dBm</td>
<td>0 dBm</td>
<td>6.97 dBm</td>
<td>19 dBm/18 dBm</td>
</tr>
<tr>
<td>System Efficiency</td>
<td>26.6%</td>
<td>14%</td>
<td>29.1%</td>
<td>9.5%/7.5%</td>
</tr>
<tr>
<td>Peak Pout</td>
<td>10 dBm</td>
<td>8 dBm</td>
<td>13.9 dBm</td>
<td>25.5/27 dBm</td>
</tr>
<tr>
<td>Peak Efficiency</td>
<td>49.6%</td>
<td>45%</td>
<td>40.4%</td>
<td>14.5%/14.1%</td>
</tr>
<tr>
<td>EVM</td>
<td>4%</td>
<td>4.4%</td>
<td>/</td>
<td>3%</td>
</tr>
</tbody>
</table>

![Figure 5.8: Layout overview.](image)
In this thesis, I worked as a part of the IMEC ultra-low-power RF team and contributed to the design of an energy-efficient IQ-sharing digital transmitter. The features of this transmitter are:

1. High system efficiency is achieved due to proposed IQ-sharing architecture and PA optimization.
2. A 6-bit switched-capacitor power amplifier (SCPA) is realized which leads to superior linearity. The quadrature nature of this transmitter results in wide video bandwidth.

### 6.1 My contributions

My contributions in this project were:

1. System analysis and verification of proposed transmitter model in MATLAB.
2. Modeling and optimizing the power amplifier in MATLAB.
3. Schematic design and verification of the power amplifier.
4. Implement the digital processor in VHDL.
5. Layout design for the whole transmitter (except for ADPLL).
6. Circuit design of a few modules, such as divider, MUX, etc.

### 6.2 Future work

By the time of writing this thesis, the tape-out is finished. Future work should start with verifying the functionality of digital TX by measuring the chip. System verification would require some FPGA programming.

Due to the degradation of the saturater, other clipping methods can be explored in the future design.
Bibliography


Master of Science Thesis

Lei Chen


