

Delft University of Technology

#### Energy Efficient and Intrinsically Linear Digital Polar Transmitters

Hashemi, M.

DOI 10.4233/uuid:7ad707a5-2db6-4185-bd3a-7a97fcf74b23

Publication date 2020

**Document Version** Final published version

Citation (APA)

Hashemi, M. (2020). Energy Efficient and Intrinsically Linear Digital Polar Transmitters. [Dissertation (TU Delft), Delft University of Technology]. https://doi.org/10.4233/uuid:7ad707a5-2db6-4185-bd3a-7a97fcf74b23

#### Important note

To cite this publication, please use the final published version (if applicable). Please check the document version above.

Copyright Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology. For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.

# Energy Efficient and Intrinsically Linear Digital Polar Transmitters

Mohsen Hashemi



#### **ENERGY EFFICIENT AND INTRINSICALLY LINEAR DIGITAL POLAR TRANSMITTERS**

#### **ENERGY EFFICIENT AND INTRINSICALLY LINEAR DIGITAL POLAR TRANSMITTERS**

#### Proefschrift

ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof. dr. ir. T.H.J.J. van der Hagen, voorzitter van het College voor Promoties, in het openbaar te verdedigen op maandag 21 december 2020 om 10:00 uur

door

#### Mohsen НАЅНЕМІ

Faculteit Elektrotechniek, Wiskunde en Informatica, Technische Universiteit Delft, Delft, Nederland, geboren te Ilam, Iran. This dissertation has been approved by the

Promotor: Prof. dr. ing. Leo. C. N. de Vreede

Composition of the doctoral committee:

| Rector Magnificus,                  | Chairman                       |
|-------------------------------------|--------------------------------|
| Prof. dr. ing. Leo. C. N. de Vreede | Delft University of Technology |

Independent members: Prof. dr. ir. P. G. M. Baltus, Prof. dr. ir. B. Nauta Prof. dr. P. Wambacq, Dr. S. Pires, Prof. dr. R. B. Staszewski, Dr. M. S. Alavi, Prof. dr. N. Llombart Juan,

Eindhoven University of Technology University of Twente Vrije Universiteit Brussel Belgium Ampleon B.V. Delft University of Technology Delft University of Technology Delft University of Technology, reserve member





Keywords:polar, digital TX, digital power amplifier, Doherty power amplifier, dig-<br/>ital predistortion, efficient, linear, widebandFront:Polar representation of a nonlinearly sized transistor array.

Copyright © 2020 by Mohsen Hashemi

ISBN 978-94-6421-184-9

An electronic version of this dissertation is available at http://repository.tudelft.nl/.

To my parents and sisters, and to my lovely wife, Negin

Human beings are members of a whole, In creation of one essence and soul. If one member is afflicted with pain, Other members uneasy will remain. If you've no sympathy for human pain, The name of human you cannot retain!

Iranian poet Saadi of Shiraz, 1210-1291

# **CONTENTS**

| Su                                               | imma                           | ary                                          | xiii |  |  |
|--------------------------------------------------|--------------------------------|----------------------------------------------|------|--|--|
| Sa                                               | men                            | watting                                      | xvii |  |  |
| 1                                                | Intr                           | roduction                                    | 1    |  |  |
|                                                  | Main Transmitter Architectures | 4                                            |      |  |  |
|                                                  |                                | 1.1.1 Cartesian                              | 4    |  |  |
|                                                  |                                | 1.1.2 Outphasing (LINC)                      | 5    |  |  |
|                                                  |                                | 1.1.3 Polar                                  | 7    |  |  |
|                                                  |                                | 1.1.4 Hybrid Architectures                   | 7    |  |  |
|                                                  | 1.2                            | Transmitter Figures-of-Merit                 | 8    |  |  |
|                                                  |                                | 1.2.1 Efficiency                             | 8    |  |  |
|                                                  |                                | 1.2.2 Spectral Purity                        | 8    |  |  |
|                                                  |                                | 1.2.3 Signal Accuracy                        | 9    |  |  |
|                                                  | 1.3                            | Analog-Intensive Transmitters                | 10   |  |  |
|                                                  |                                | 1.3.1 Analog Cartesian TX                    | 10   |  |  |
|                                                  |                                | 1.3.2 Analog Polar TX                        | 10   |  |  |
|                                                  | 1.4                            | Digital-Intensive Transmitters               | 11   |  |  |
|                                                  |                                | 1.4.1 Digital Cartesian TX                   | 11   |  |  |
|                                                  |                                | 1.4.2 Digital Polar TX                       | 12   |  |  |
| 1.5 Design Challenges of a Wideband Efficient TX |                                | Design Challenges of a Wideband Efficient TX | 13   |  |  |
|                                                  | 1.6                            | Thesis Objectives                            | 15   |  |  |
| 1.7 Thesis Outline                               |                                | Thesis Outline                               | 16   |  |  |
|                                                  | Refe                           | erences                                      | 17   |  |  |
| 2                                                | Tow                            | ards the Optimum Digital Polar Transmitter   | 21   |  |  |
|                                                  | 2.1                            | 1 Introduction                               |      |  |  |
|                                                  | 2.2                            | Digital Polar TX                             | 22   |  |  |
|                                                  |                                | 2.2.1 Phase Modulation                       | 22   |  |  |
|                                                  |                                | 222 Amplitude Modulation                     | 25   |  |  |

|   | 2.3  | Digital Power Amplifier        |                                                        |    |
|---|------|--------------------------------|--------------------------------------------------------|----|
|   |      | 2.3.1                          | Concept of the Switch-Mode PA                          | 27 |
|   |      | 2.3.2                          | Class-E PA                                             | 28 |
|   |      | 2.3.3                          | Load-Insensitive Class-E PA                            | 30 |
|   |      | 2.3.4                          | Class-E DPA Array                                      | 31 |
|   |      | 2.3.5                          | Modulated Efficiency vs. CW Efficiency                 | 32 |
|   | 2.4  | Dohe                           | rty Efficiency Enhancement                             | 34 |
|   | Refe | erences                        | · · · · · · · · · · · · · · · · · · ·                  | 38 |
| 3 | Non  | linear                         | Systems and Digital Predistortion                      | 43 |
|   | 3.1  | Intro                          | luction                                                | 44 |
|   | 3.2  | Behav                          | rioral Modeling of Nonlinear Systems                   | 44 |
|   |      | 3.2.1                          | Volterra series                                        | 44 |
|   |      | 3.2.2                          | Equivalent Baseband Model of a Volterra Series         | 46 |
|   |      | 3.2.3                          | Memory Polynomial Model                                | 47 |
|   |      | 3.2.4                          | Generalized Memory Polynomial Model                    | 48 |
|   |      | 3.2.5                          | Baseband Model of ACW-AM and ACW-PM Conversion         | 48 |
|   |      | 3.2.6                          | General Mathematical Model                             | 50 |
|   |      | 3.2.7                          | New Basis Functions Proposals for a Switch-Mode DPA    | 50 |
|   |      | 3.2.8                          | System Identification by LS Algorithm                  | 51 |
|   | 3.3  | Digita                         | ll Predistortion                                       | 53 |
|   |      | 3.3.1                          | Mathematical DPD Model Extraction                      | 56 |
|   |      | 3.3.2                          | Indirect-Learning DPD                                  | 57 |
|   |      | 3.3.3                          | Direct-Learning DPD                                    | 58 |
|   |      | 3.3.4                          | Sampling Rate Requirement for DPD Model Extraction     | 59 |
|   |      | 3.3.5                          | Challenges of DPD                                      | 61 |
|   |      | 3.3.6                          | DPD-Less Linearization                                 | 62 |
|   | 3.4  | Concl                          | usion                                                  | 62 |
|   | Refe | erences                        | ·                                                      | 63 |
| 4 | An I | ntrins                         | ically Linear Wideband Polar Digital Power Amplifier   | 69 |
|   | 4.1  | 4.1 Introduction               |                                                        |    |
|   | 4.2  | Class-E DPA Linearity Analysis |                                                        |    |
|   |      | 4.2.1                          | DC Characteristic Curve and Dynamic Load Lines         | 71 |
|   |      | 4.2.2                          | Analysis of the ACW-AM and ACW-PM Distortion Mechanism | 72 |
|   |      | 4.2.3                          | Power and Efficiency Roll-off                          | 75 |

|   | 4.3                              | Propo                                   | Proposed Linearization Techniques                           |  |  |
|---|----------------------------------|-----------------------------------------|-------------------------------------------------------------|--|--|
|   |                                  | 4.3.1                                   | Nonlinear Sizing                                            |  |  |
|   |                                  | 4.3.2                                   | Overdrive Voltage Tuning for PVT Compensation         80    |  |  |
|   |                                  | 4.3.3                                   | Multiphase RF Clocking    81                                |  |  |
|   |                                  | 4.3.4                                   | Harmonic Tuning for Efficiency Enhancement.    85           |  |  |
|   | 4.4                              | Imple                                   | ementation                                                  |  |  |
|   |                                  | 4.4.1                                   | Class-E DPA with On-chip Matching Network                   |  |  |
|   |                                  | 4.4.2                                   | Class-E DPA with Off-chip Matching Network 90               |  |  |
|   | 4.5                              | Measu                                   | urement Results                                             |  |  |
|   |                                  | 4.5.1                                   | Static (CW) Power/Efficiency Measurements                   |  |  |
|   |                                  | 4.5.2                                   | Static Linearity Measurement by Triangle Signal             |  |  |
|   |                                  | 4.5.3                                   | Modulated Signal Measurement                                |  |  |
|   | 4.6                              | Concl                                   | lusion                                                      |  |  |
|   | Refe                             | erences                                 | <b>5</b>                                                    |  |  |
| 5 | лн                               | appr-1                                  | inear Widehand Polar Class F CMOS Digital Doborty PA        |  |  |
| J | 51                               | Introc                                  | hustion 108                                                 |  |  |
|   | 5.2                              | Widel                                   | and Class E Doberty PA                                      |  |  |
|   | 5.2                              | 5.2.1                                   | Reactance Compensated Parallel-Circuit Class-E PA 110       |  |  |
|   |                                  | 522                                     | Compensated Impedance Inverter 112                          |  |  |
|   |                                  | 5.2.2                                   | Compensated Marchand Balun with Second Harmonic Control 112 |  |  |
|   | 53                               | Digitally Controlled Class E Debarty DA |                                                             |  |  |
|   | 5.4                              | System Level Design Consideration       |                                                             |  |  |
|   | 5.4                              | 5 4 1                                   | Nonuniform quantization 117                                 |  |  |
|   |                                  | 542                                     | AM - PM Timing Mismatch 120                                 |  |  |
|   |                                  | 543                                     | Main - Deak Timing Mismatch                                 |  |  |
|   | 5.5. Circuit-Level Linearization |                                         | it-Level Linearization 121                                  |  |  |
|   | 0.0                              | 5.5.1                                   | ACW-AM Correction 121                                       |  |  |
|   |                                  | 5.5.2                                   | ACW-PM Correction 124                                       |  |  |
|   | 56                               | Imple                                   | mentation and Fabrication                                   |  |  |
|   | 5.0                              | 5.6.1                                   | CMOS Chips                                                  |  |  |
|   |                                  | 5.6.2                                   | Balun and Matching Network                                  |  |  |
|   |                                  | 5.6.2                                   | Overall Implementation 120                                  |  |  |
|   |                                  | 0.0.0                                   |                                                             |  |  |

|                  | 5.7                             | Measurement Results |                                                                 |       |  |
|------------------|---------------------------------|---------------------|-----------------------------------------------------------------|-------|--|
|                  |                                 | 5.7.1               | Static Measurements.                                            | . 132 |  |
|                  |                                 | 5.7.2               | Modulated Signal Measurements                                   | . 136 |  |
|                  | 5.8                             | Concl               | usion                                                           | . 137 |  |
|                  | Refe                            | erences             |                                                                 | . 137 |  |
| 6                | Digi                            | ital Pro            | edistortion and System-level Considerations to further Push the |       |  |
|                  | Linearity Limits of a Polar DPA |                     |                                                                 |       |  |
| 6.1 Introduction |                                 |                     | . 144                                                           |       |  |
|                  | 6.2                             | Linea               | rity Limits of a polar DPA                                      | . 144 |  |
|                  |                                 | 6.2.1               | Aliasing of the Residual Sampling Spectral Replicas             | . 145 |  |
|                  |                                 | 6.2.2               | Nonuniform Quantization Noise.                                  | . 146 |  |
|                  | 6.3                             | Linea               | rization of Digital Polar Transmitters                          | . 149 |  |
|                  |                                 | 6.3.1               | Linearization using ILC with LUTs                               | . 149 |  |
|                  |                                 | 6.3.2               | Proposed ILC-Inspired Direct-Learning DPD                       | . 150 |  |
|                  | 6.4                             | Measu               | urement Results                                                 | . 152 |  |
|                  | 6.5                             | Concl               | usion                                                           | . 155 |  |
|                  | Refe                            | erences             | 3                                                               | . 156 |  |
| 7 Conclusion     |                                 |                     | n                                                               | 157   |  |
|                  | 7.1                             | Thesis              | s Outcome                                                       | . 158 |  |
|                  | 7.2                             | Sugge               | stions for Future Developments.                                 | . 161 |  |
| Li               | st of A                         | Acrony              | 7 <b>ms</b>                                                     | 163   |  |
| Li               | st of ]                         | Figure              | S                                                               | 167   |  |
| Li               | st of '                         | <b>Fables</b>       |                                                                 | 175   |  |
| Ac               | :knov                           | vledge              | ments                                                           | 177   |  |
| Cı               | ırricı                          | ılum V              | <sup>7</sup> itæ                                                | 179   |  |
| Li               | st of ]                         | Publica             | ations                                                          | 181   |  |

### **SUMMARY**

One of the biggest challenges in modern transmitter (TX) design, when going from the fourth generation (4G) to fifth generation (5G) communications network, is to handle the increased linearity requirements without introducing any compromise in the energyefficiency of the TX line-up. In analog systems, high quality for the TX signal can be only achieved when using very linear operation of the (analog) power amplifier (PA). This severely limits the achievable efficiency in practical TX line-ups. Alternatively, a nonlinear PA can be used, which is linearized by digital pre-distortion (DPD) circuitry. This later approach is commonly used in (4G) macro-cell base stations, but it comes at the cost of increased system complexity and high supply power for the advanced DPD unit. When going towards 5G handset, or massive - multiple - input - multiple - output (mMIMO) 5G base station units, that facilitate beamforming and higher data rates to their end users. The required RF output power per individual transmitter is rather low (at most only a few watts). However, since many more transmitters are used in 5G applications (e.g. a factor 64 x to 256 x more than in 4G base stations) the use of an advanced DPD units in each individual TX-lineup, with their related high-power consumption becomes simply impractical. Consequently, to address these changing needs, it is highly desirable to find new circuit-level TX solutions, that overcome the traditional linearity-efficiency trade-off. To achieve this goal, this PhD work is focused on the utilization and tailoring of digital device operation, as facilitated by advanced CMOS technologies, towards the needs of modern wireless applications with their wideband complex modulated TX signals. The circuit techniques developed within this thesis, target an inherently linear amplitude-code-word (ACW) to TX output signal transfer, as such omitting completely the need for a power hungry advanced DPD unit, or alternatively, rely on a much more simple and consequently less power hungry DPD unit for the most demanding applications (e.g. when handling large modulation bandwidths). The circuit techniques developed in thesis, allow excellent drain and TX line-up efficiency, while being compatible with wideband efficiency enhancement techniques like Doherty. The proposed circuit techniques are also able to correct for process, voltage, load and temperature variations of the application.

The outline of this thesis work is as follows:

Chapter 1 provides an introduction to the field of wireless communication and the most common modulators architectures used to create the complex modulated TX signals.

In Chapter 2, polar TX operation and the digital polar TX architectures are discussed in more detail. Special attention is given to RFDAC-based solutions, that can meet the needs of the phase modulator and the amplitude modulator when dealing with larger modulation bandwidths. Also, the structure and design of class-E (D)PA as well as Doherty DPA transmitters are briefly described.

Chapter 3 gives an overview of the behavioral modeling techniques for nonlinear systems. It includes Volterra series, memory-polynomial (MP), generalized-MP (GMP) models, as well as, parameter estimation techniques such least-square (LS) algorithms. It is shown that the real-signal passband nonlinearity can be translated to a complex-signal baseband nonlinearity, which provides the foundation for digital pre-distortion techniques that operate in baseband rather than at the RF fundamental frequency. To optimally handle the switch-mode DPA operation used in this thesis work, new basis functions are proposed that closely match the DPA nonlinearities, and hence drastically reduce the order of the nonlinear kernels in the mathematical DPD description. In addition, various other aspects of DPD are described, including the theory when using under sampling techniques for nonlinear system identification and DPD model extraction.

Chapter 4 introduces three novel circuit-level linearization techniques for switchedmode power amplifiers (DPA); namely nonlinear sizing, overdrive-voltage control, and multiphase RF clocking. These techniques allow to circumvent any kind of DPD in low power applications (e.g. such as handheld mobile), or tremendously relax the DPD task in more demanding applications (such as wideband 5G base stations). They also allow digitally controlled fine-tuning of the amplitude-code-word (ACW)-AM and ACW-PM curves to compensate for the variations of process-voltage-temperature, operating frequency, and output load. As theoretical foundation the nonlinearity behavior of a class-E DPA is thoroughly analyzed and closed-form equations are given to predict the ACW-AM ACW-PM curves of the DPA. Two different linear DPA versions are designed, fabricated and measured; one with an on-chip matching network (MN) and one with off-chip MN based on a novel compensated Marchand balun.

In Chapter 5, an intrinsically linear wideband class-E CMOS Doherty DPA is presented. Closed-form equations are extracted to predict its ACW-AM and ACW-PM curves. System-level considerations that emphasize the importance of lowering the timing mismatch between Peak and Main DPA on the ACW-AM and ACW-PM performance are provided. The details of the design and implementation of a novel off-chip matching/load network of Doherty PA, based on compensated Marchand balun with re-entrant coupled lines is presented. Using extended circuit-level linearization techniques for Doherty TX configurations, two separate chips with a comparable architecture but with different DPA parameters are designed and fabricated. The measured results confirm the forgoing theory and due to the uncompromised linearity-efficiency performance of the proposed method set the new the state-of-the-art in DPD-free TX operation in terms of linearity and efficiency.

In Chapter 6, the theory related to the linearity limiting factors of a digital polar DTX is given. Also, two less-studied but significant system-level factors for DTX operation, namely non-uniform quantization noise and spectral sampling replicas (SSR) of the PM signal, are investigated, and practical solutions to these are presented. By combining the proposed circuit level linearization techniques, with digital pre-distortion based on the iterative learning control (ILC) technique, the maximum achievable linearity performance, which worked out to be close to the theoretical quantization noise limits was confirmed by measurements. Furthermore, a novel real-time direct-learning DPD inspired by the ILC technique is proposed, which in contrast to the conventional direct-learning DPDs, directly extracts its parameters using a LS algorithm. This approach allows a very low computational overhead and enables to meet the most demanding linearity – bandwidth requirements with the lowest supply power requirements.

Finally, Chapter 7 draws the conclusions of this thesis and provides some suggestions for future research and developments.

## SAMENVATTING

In de overgang van de vierde generatie (4G) naar de vijfde generatie (5G) communicatie netwerken is een van de grootste uitdagingen het ontwerpen van een moderne zender (TX) die voldoet aan de verhoogde lineariteitseisen, zonder dat dit leidt tot een toename in het energieverbruik. In analoge systemen kan een hoge kwaliteit voor het TX-signaal alleen worden bereikt door gebruik te maken van een zeer lineaire werking van de (analoge) eindversterker (PA). Dit beperkt de te behalen efficiëntie voor een praktische TX configuratie. Als alternatief kan een niet-lineaire PA worden gebruikt, welke gelineariseerd wordt door digitale pre-distorsie (DPD). Deze laatste aanpak wordt gebruikt in (4G) macro-cel basisstations, maar leidt tot een hogere systeemcomplexiteit en energieverbruik door de toevoeging van zo'n geavanceerde DPD-unit. Dit geldt in het bijzonder voor 5G-handsets of "massive-Multiple-Input-Multiple-Output" (mMIMO) 5Gbasisstations, welke gebruik maken van bundelvorming en hogere datasnelheden bieden aan hun eindgebruikers. In deze applicaties is het benodigde HF- uitgangsvermogen per individuele zender vrij laag (hooguit een paar watt). Maar aangezien er met veel meer zenders wordt gewerkt (bijv. 64x tot 256x meer dan in 4G-basisstations), wordt het gebruik van een geavanceerde DPD-unit in elke afzonderlijke TX-line-up, met bijbehorend stroomverbruik, onpraktisch. Om aan deze veranderende eisen te voldoen is het wenselijk om nieuwe TX-oplossingen op circuitniveau te vinden, die de traditionele uitruil tussen lineariteit en efficiëntie vermijden. Om dit doel te bereiken is dit promotiewerk gericht op het ontwikkelen van nieuwe, digitale schakelingen in geavanceerde CMOS-technologieën, welke voldoen aan de behoeften van moderne draadloze toepassingen met breedband zendsignalen. De technieken die in dit proefschrift zijn ontwikkeld, richten zich op een inherent lineaire omzetting van het amplitude-codewoord (ACW) naar het zendsignaal. Door deze aanpak wordt de behoefte aan een (energieverslindende) geavanceerde DPD-eenheid volledig weggenomen of gereduceerd naar een veel eenvoudiger (en dus minder stroom verslindende) DPD-unit voor de meest veeleisende toepassingen (bijv. het werken met zeer hoge bandbreedtes). De circuittechnieken die in dit proefschrift zijn ontwikkeld, zorgen dan ook voor een uitstekende TX efficiëntie, terwijl ze compatibel zijn met breedband-efficiëntie-verbeteringstechnieken zoals Doherty. De voorgestelde circuittechnieken zijn ook in staat om variaties in: proces, spanning, belasting en temperatuur van de zender te corrigeren. De opzet van dit proefschrift is als volgt:

Hoofdstuk 1 geeft een inleiding op het gebied van draadloze communicatie en de meest voorkomende modulatorarchitecturen die worden gebruikt om complexe ge- moduleerde TX-signalen te creëren.

In hoofdstuk 2, wordt het polaire TX-principe en digitale polaire TX-architecturen in meer detail besproken. Speciale aandacht wordt besteed aan op RFDAC gebaseerde oplossingen, die aan de eisen kunnen voldoen van breedband fase- en amplitudemodulatoren. Ook worden de architectuur en het ontwerp van klasse-E (D) PAs en Doherty DPA-zenders beschreven.

In hoofdstuk 3, wordt een overzicht gegeven van de modelleringstechnieken die nodig zijn om het gedrag van niet-lineaire systemen te beschrijven. Het omvat o.a.: Volterraseries, geheugen-polynomen (MP), gegeneraliseerde MP (GMP) -modellen, alsmede parameter bepalingstechnieken zoals het kleinste kwadraten (LS) algoritme. Aangetoond wordt dat de niet-lineariteit van het TX signaal in de doorlaatband kan worden omgezet in een complex basisband signaal met een niet-lineariteit. Dit geeft de basis voor het gebruik van digitale pre-distorsie technieken welke in de basisband opereren in plaats van op de RF frequentie. Om optimaal gebruik te kunnen maken van de switch-mode DPA operatie, worden nieuwe basisfuncties geïntroduceerd die beter aansluiten bij het karakter van de DPA niet-lineariteiten. Hierdoor kan de orde van de niet-lineaire kernels in de wiskundige DPD-beschrijving, drastisch worden verminderd. Daarnaast worden verschillende aspecten van DPD beschreven waaronder de theorie voor het gebruik van "sub-sampling" voor de niet-lineaire systeemidentificatie en DPD-modelextractie.

Hoofdstuk 4 introduceert drie nieuwe linearisatietechnieken op circuitniveau voor de implementatie van geschakelde vermogensversterkers (DPA); namelijk niet-lineaire dimensionering, een overdrive- /spanningsregeling en het gebruik van meerfasige RFklokken. Deze technieken maken het mogelijk om elke vorm van DPD te omzeilen in toepassingen met een laag stroomverbruik (bijv. handheld mobiel) of om de DPD-taak enorm te vereenvoudigen in veeleisende toepassingen (zoals breedband 5G-basisstations). Ook maken ze de digitale correctie van de amplitude-codewoord (ACW) overdracht in termen van de ACW-AM en ACW-PM mogelijk. Eventuele veranderingen door de variatie in voedingspanning, temperatuur, zendfrequentie en uitgangsbelasting kunnen hiermee worden gecompenseerd. Als theoretische basis wordt het niet-lineaire gedrag van een klasse-E DPA grondig geanalyseerd, wat resulteert in vergelijkingen die een voorspelling van de ACW-AM ACW-PM-curven van de DPA mogelijk maken. Er zijn twee verschillende intrinsiek lineaire DPA- prototypen ontworpen, vervaardigd en gemeten; één met een on-chip matching netwerk (MN) en één met off-chip MN gebaseerd op een gecompenseerde transmission line Marchand balun.

In Hoofdstuk 5, wordt een intrinsiek lineair breedband klasse-E CMOS Doherty DPA gepresenteerd. Ook hier worden analytische vergelijkingen gegeven om de ACW-AMen ACW-PM-curven te voorspellen. Op systeemniveau wordt het belang benadrukt van het zo klein mogelijk maken van eventuele tijdsverschillen tussen de verschillende DPA takken en de gerelateerde impact op de bijbehorende ACW-AM en ACW-PM prestaties. Het ontwerp en de implementatie van een nieuw off-chip Doherty uitgangsnetwerk, gebaseerd op een gecompenseerde Marchand-balun met gekoppelde lijnen wordt gegeven. M.b.v. circuit linearisatietechnieken voor de Doherty TX-configuraties worden twee afzonderlijke chips met een vergelijkbare architectuur, maar met verschillende DPAparameters, ontworpen en gefabriceerd. De gemeten resultaten bevestigen de hiervoor geïntroduceerde theorie. De nieuwe compromisloze ontwerpmethode definieert de nieuwe stand van de techniek op het gebied van DPD- vrije zenders in termen van lineariteit en efficiëntie.

In Hoofdstuk 6, wordt de theorie gegeven voor de factoren die de lineariteit van een digitale polaire DTX begrenzen. Ook worden twee, minder bestudeerde, maar toch belangrijke systeemparameters voor de DTX-werking onderzocht, namelijk de niet-uniforme kwantiesatieruis en spectrale bemonsteringsreplica's (SSR) van het PM-signaal. Praktische oplossingen voor het verruimen van deze begrenzingen worden gegeven. Door de geïntroduceerde linearisatietechnieken op circuitniveau te combineren met digitale predistorsie, gebruik makend van de iteratieve leercontrole (ILC) -techniek, is er een lineariteitsniveau bereikt en gemeten, dat heel dichtbij de theoretische kwantiesatie ruisgrens ligt. Verder is een nieuwe real-time "direct-learning" DPD voorgesteld, die geïnspireerd is door de ILC-techniek. In tegenstelling tot conventionele "direct-learning" DPD's, extraheert de voorgestelde techniek zijn parameters direct d.m.v. een "least-mean-square" ( LS)-algoritme. Deze laatste benadering geeft een zeer lage rekenlast en maakt het mogelijk om te voldoen aan de meest veeleisende lineariteit – bandbreedte eisen, bij een zo gering mogelijk energieverbruik.

Hoofdstuk 7 geeft de belangrijkste conclusies van dit proefschrift met suggesties voor toekomstig onderzoek.

# 

# **INTRODUCTION**

NE of the first telecommunication systems based on electrical signals, invented by Charles Wheatstone and William Cooke in 1837 [1], was in fact a pseudo digital communication system. It consisted of a five-needle telegraph which needed five wires and could only code 20 letters of the alphabet However, transmitting wideband signals such as audio or video signals was impossible with such a system as the actual data-rate was limited by how fast the human operator could encode or decode the telegraph codes. It would take seven more decades before a truly audible wireless transmission system was invented and tested by Reginald Fessenden in 1906 [2]. Fessenden used an electromechanical generator (Fig. 1.1a) driven by an external motor or a steam turbine to generate 50~100KHz RF power for his amplitude-modulation (AM) transmitter (TX) with an antenna tower higher than 100 m. Around the same time, more advanced technologies were being developed and tested to achieve a better form-factor and power/efficiency at a lower cost, such as the diode valve in 1904 by John Ambrose Fleming [3], and later the triode vacuum tube (Fig. 1.1b) in 1907 by Lee de Forest [3]. The triode tube found widespread use around 1912 as it could be used to amplify voltage and thus RF power. It was even used until two decades after the invention of transistors in 1947 by John Bardeen, Walter Brattain and William Shockley [4]. Then starting in the 1970s, the solid-state semiconductor transistors became increasingly popular in designing wireless designs. Nowadays, it is solid-state transistors (Fig. 1.1c) that are used in most of the high-power RF transmitters (e.g. in base-stations) as discrete components, while in the RF applications with a low to medium output power, the trend in the recent years has been to fully integrate the complete TX line-up including the power amplifier on a single chip (Fig. 1.1d).

In recent years, demand for increasing the data rate has grown, most which has been driven by the entertainment industry. Online video streaming accounts for more than 75% of the overall internet bandwidth consumed [5]. Wireless communication system markets and industries have also been impacted by this large demand for data rates as we can see from the evolution of mobile communication standards from 1G (analog) and 2G (14.5Kb/s) to 3G (20-100MB/s), 4G (100-1000MB/s), 5G (1-10Gb/s), and so on. This data rate trend is summarized in Fig. 1.2 [6], showing a growth factor of 50 per decade. New generations of communication systems will ultimately support online 8K video streaming, high definition augmented reality (AR), virtual reality (VR) and the internet-of-things (IoT), which will require not only significant design and engineering efforts, but also new novel ideas and innovations to tackle the TX/RX design challenges.

Modern digital wireless communication systems use a combination of amplitude

3



Figure 1.1: Evolution of RF-power generation: (a) an electromechanical RF power generator for an AM TX [2] with dimensions in the order of meters, (b) an early model of De Forest's triode vacuum tube [3] as the final stage of a TX with dimensions in the order of ten centimeters (Image courtesy of Reverse Time Page at http://uv201.com), (c) a typical modern high-power RF transistor as the final stage of a TX with dimensions in the order of centimeters, (d) the worlds first fully digital single chip Doherty transmitter, including the baseband and final RF power stage circuitry with dimensions in the order of millimeters, as designed by the ELCA research group of the Delft University of Technology [7]

and phase modulation of the RF carrier signal to increase the spectral efficiency of their TX signals. In these transmitters the original digital baseband (BB) input data is converted into two parallel bit-streams that represent the In-phase (I) and Quadrature (Q) data, which are treated completely independently from each other assuming a perfect orthogonal relation. It is this orthogonality assumption that allows mapping the original digital baseband data in a two-dimensional (2D) plane.

Many different modulation standards can be constructed based on such a 2D representation, of which Quadrature-Amplitude-Modulation (QAM) is one of most well-



Figure 1.2: Data-rate trends in wireless and wireline communication systems [6].

known. In this particular TX signal representation, the baseband data are mapped on the 2D constellation diagram, in which the order of the modulation, is the number of data points in the constellation diagram. These "complex" 2D data symbols are represented by the (real) In-phase "I" data on the horizontal and the (imaginary) Quadrature "Q" data on the vertical axis. For example, Fig. 1.3 shows the constellation diagram of 4-QAM (QPSK) and 16-QAM signals, where each data point represents 2-bits (1-bit In-Phase, 1-bit Quadrature), and 4-bits (2-bits In-Phase, 2-bits Quadrature), respectively. Other more advanced modulation schemes can be utilized, e.g. orthogonal frequencydivision multiplexing (OFDM), to improve the TX signal for specific properties (e.g. robustness against multipath channel fading) for particular application communication scenarios [8].

#### **1.1.** MAIN TRANSMITTER ARCHITECTURES

There are various techniques to modulated the baseband data on the RF carrier. Below we briefly discuss the most well-known ones.

#### 1.1.1. CARTESIAN



1010 1011 1001 1100 1000

Figure 1.3: 4-QAM (left) and 16-QAM (right) constellation diagrams.

#### SUMMING OF TWO AMPLITUDE VARYING SIGNALS WITH A CONSTANT 90 DEGREE PHASE DIFFERENCE

The Cartesian approach uses complex summing (through the use of a 90 degree phase difference) of the two orthogonal amplitude signals (I and O), as shown conceptually in Fig. 1.4a. The QAM modulation concept was originally proposed by Campopiano and Glazer in 1962 [9]. It is important to realize that the summing of the I and Q signals must be perfectly orthogonal. Therefore, in practical implementations, the summing of the I and O signals is mostly done at low power levels in the current domain. This yields the conclusion that combining the Cartesian signals in the transmitter output stage, where high-efficiency operation is important, without any special measures is typically problematic. For this reason, most practical systems prefer a low-power Cartesian modulator (e.g. a quadrature mixer configuration), followed by a linear amplifier line-up. The latter comes typically at the cost of linearity/efficiency performance. Nowadays, Cartesian architectures are the work horse of wireless systems. Their implementation considerations will be discussed in more details in Sections 1.3 and 1.4.

#### **1.1.2.** OUTPHASING (LINC)

#### SUMMING OF TWO CONSTANT-AMPLITUDE SIGNALS WITH VARYING PHASE OFFSET

In an outphasing TX, originally proposed by Chireix in 1935 [10], two constant-amplitude signals are phase-modulated and can be calculated as  $\Phi_1 = \arctan(Q/I) + \arccos(\sqrt{I^2 + Q^2}/2)$ and  $\Phi_2 = \arctan(Q/I) - \arccos(\sqrt{I^2 + Q^2}/2))$ . This technique, which avoids the need for a linear amplifier line-up in the transmitter, is often referred to as LINC, which is an acronym for "Linear Amplification using Nonlinear Components", as proposed by Cox in 1974 [11], and shown conceptually in Fig. 1.4b. However, the signal combining itself still needs to be done such that the two power amplifiers (PAs) do not interfere/interact



(c)

Figure 1.4: Concepts of the three main TX architectures: (a) Cartesian, (b) polar, (c) outphasing (LINC).

with each other. In practical implementations, this can be accomplished by using an isolating power combiner, although this comes at the cost of overall system efficiency, as it achieves its maximum efficiency only at peak output power conditions. However, by using a non-isolating Chireix power combiner [12–14], high efficiency at both peak and back-off output power levels can be achieved, improving the average efficiency at the cost of an increase in interaction between the branches.

6

#### **1.1.3.** POLAR

#### ONE SIGNAL WITH AMPLITUDE AND PHASE MODULATION

To improve on overall TX efficiency performance and to avoid the problems related to summing two signals in the analog domain, the polar modulation technique has been developed, which is based on the envelope elimination and restoration (EER) technique proposed by Kahn in 1952 [15]. In this technique, by using an envelope detector and a limiter, the input modulated RF signal is decomposed into an envelope signal (which is the amplitude modulation (AM) signal) and a constant envelope phase-modulated (PM) RF signal, respectively. The PM signal drives the PA, and the AM signal is used to modulate the voltage supply of the PA. In view of this, a modulated RF signal can be decomposed into its AM and PM signals as follows:

$$X_{RF}(t) = I(t)\cos(\omega_0 t) - Q(t)\sin(\omega_0 t) = \rho(t)\cos(\omega_0 t + \Phi(t))$$
(1.1)

$$AM(t) = \rho(t) = \sqrt{I(t)^2 + Q(t)^2}$$
(1.2)

$$PM(t) = \cos\left(j\omega_0 t + j\Phi(t)\right) = \cos\left(\omega_0 t + \arctan\left(Q(t)/I(t)\right)\right)$$
(1.3)

In a polar TX, the amplitude and phase are first modulated independently after converting the input I/Q signal to amplitude  $(AM = \sqrt{I^2 + Q^2})$  and phase  $(\Phi = \arctan(Q/I))$ , and then recombined (multiplied by each other) at the output by the PA, as shown in Fig. 1.4c. In this approach, the PA is driven by a constant-amplitude PM signal, eliminating the need for an RF power combiner, thus improving the overall efficiency of the system, while the phase - amplitude recombination remains more or less orthogonal in nature by itself. In Sections 1.3 and 1.4, the implementation considerations will be discussed in more details.

#### **1.1.4.** Hybrid Architectures

Besides the three main TX architectures, there are also hybrid architectures, which combine two of the three main architectures. For example, by adding an auxiliary phase modulation to the amplitude vectors in a Cartesian TX, a hybrid polar architecture (also know as multiphase Cartesian) [16] can be formed. Moreover, by adding auxiliary amplitude modulation to the phase vectors in an outphasing TX, a hybrid architecture known as mixed-mode or multi-level outphasing[12] can be created.

#### **1.2.** TRANSMITTER FIGURES-OF-MERIT

In order to be able to quantify the performance of different transmitter implementations, we will briefly discuss the most commonly used transmitter Figures-of-Merit.

#### **1.2.1.** EFFICIENCY

The drain efficiency (DE) and power-added efficiency (PAE) of an analog PA (without the drivers) are defined as follows:

$$DE = \frac{P_{OUT}}{V_{DD,PA}I_{DC,PA}} \tag{1.4}$$

$$PAE_{Analog} = \frac{P_{OUT}}{V_{DD,PA}I_{DC,PA} + P_{RF,IN}}$$
(1.5)

However, in a digital-intensive transmitter implementation, the pre-drivers can be implemented by simple logic gates, while the actual drive power to the output stage device(s) is very small. In these cases it is more appropriate to use the following definition for the PAE in this thesis:

$$PAE_{Digital} = \frac{P_{OUT}}{V_{DD,DPA}I_{DC,PA} + P_{DC,Drivers}}$$
(1.6)

where  $P_{DC,Drivers}$  includes the power consumption of the circuit-level linearizer as well for the work presented in the thesis.

For both digital and analog TXs, the system efficiency (SE) is defined as the ratio of the output RF power, to the sum of the total DC supply power consumption, of both the entire TX (including the phase modulator and other circuits) and the input RF power:

$$SE_{TX} = \frac{P_{OUT}}{P_{DC,Total} + P_{RF,IN}}$$
(1.7)

#### **1.2.2.** SPECTRAL PURITY

The out-of-band spectral purity of a transmitter is measured and characterized as the adjacent channel power-ratio (ACPR), which is defined as follows:

$$ACPR(dBc) = 10Log\left(\frac{P_{Adj}}{P_{Main}}\right)$$
(1.8)

where  $P_{Adj}$  and  $P_{Main}$  are the power in the adjacent and main channels, respectively. The ACPR is used to measure the linearity of the TX by measuring the distortion of the TX signal. However, for digital-intensive TX solutions it will also depends on the quantization



Figure 1.5: Conventional analog-intensive Cartesian TX.

noise power density, which makes it dependent on the DAC/RFDAC quantization resolution as well as the sampling rate. This will be explained in more detail in Chapter 6. In multi-channel/multi-user communication systems, a good/low ACPR is of great importance to guarantee the quality of each communication channel without being disrupted by the presence of other users in neighboring channels.

#### **1.2.3.** SIGNAL ACCURACY

The quality of the transmitted digital baseband signal (i.e. the bit-error-rate (BER)), depends highly on the in-band accuracy of the TX chain, which is measured by the errorvector magnitude (EVM) as defined in the following definition:

$$EVM(dB) = 20Log\left(\frac{\sqrt{\frac{1}{N_{tot}}\sum_{i=1}^{N_{tot}} \left(IQ_{BB,Out}(i) - IQ_{BB,Ideal}(i)\right)^{2}}}{\sqrt{\frac{1}{N_{tot}}\sum_{i=1}^{N_{tot}} \left(IQ_{BB,Ideal}(i)\right)^{2}}}\right)$$
(1.9)

where  $IQ_{BB,Out}$  is the measured output baseband complex (I+jQ) signal and  $IQ_{BB,Ideal}$ is the ideal input baseband complex signal. As the constellation diagram becomes denser, a smaller amount of error to properly demodulate the data can be tolerated. Therefore, a higher order of QAM modulation requires a lower EVM, typically from -19dB (11.2%) for 16-QAM to -30dB (3.2%) for 256-QAM. In practice, the EVM is limited by many factors such as the linearity of the TX, the resolution of the DACs/RFDACs (i.e. their quantization noise), the timing matching between I and Q or AM and the  $\phi$  paths, the phase noise of the LO, and the thermal noise, among others.

#### 1

#### **1.3.** ANALOG-INTENSIVE TRANSMITTERS

#### **1.3.1.** ANALOG CARTESIAN TX

In a conventional analog-intensive Cartesian TX configuration, as shown in Fig.1.5, the digital input signals, I and Q, are converted to the analog domain by two DACs and passed through a low-pass filter (LPF) to remove the sampling spectral replicas (SSRs). Two mixers then up-convert the analog I and Q signals by multiplying them with two RF signals with a 90-degree phase difference. These two amplitude-modulated RF signals are combined to create a single RF signal of which both the amplitude and phase are modulated. Such a circuit is called a Cartesian modulator. As the resulting signal is normally low-power, it should be amplified before being sent to the antenna. This is done using a power amplifier, which in general also requires a driver stage. In such a system, the PA is designed for a high-efficiency mode of operation. The energy-efficiency often comes at the cost of circuit linearity. Therefore, digital predistortion (DPD) is often applied to guarantee the overall linearity and spectral purity of the TX chain. However, to ensure good wideband performance of the whole TX line-up, the DACs, mixers, combiner and even the pre-driver should be sufficiently linear. Otherwise, the DPD cannot reach the spectral purity required. Since the PA is nonlinear, its input should be predistorted in such a way that after passing through the nonlinear PA function, the output signal is identical to the original input signal representation, except for a gain factor. Nonlinear distortion will yield an undesired bandwidth expansion of the modulated signal. Therefore, the DPD must be capable of handling a larger bandwidth ( $\sim 5\times$ ) than the original modulation. For example, for a 100 MHz signal, the TX line-up including the driver should be capable of handling a modulation bandwidth (BW) of up to 500 MHz. Such a demanding bandwidth can take a significant amount of engineering time and DPD power consumption, unless the PA is designed for sufficiently linear operation, which typically compromises the achievable power efficiency.

#### **1.3.2.** ANALOG POLAR TX

A typical analog polar TX is depicted in Fig.1.6a. Here, the CORDIC<sup>1</sup>[17, 18] calculates the amplitude and phase of the input complex I/Q data in the digital domain. The amplitude modulator is normally a low-drop-out (LDO) voltage regulator, with or without an energy-efficient DC-DC converter, while the phase modulator is normally based on a closed-loop phase-locked-loop (PLL). Compared to an ideal analog Cartesian TX line-

<sup>&</sup>lt;sup>1</sup>CORDIC is an acronym for COordinate Rotation DIgital Computer.

up, where all the signal processing operations are linear in nature, in a polar (or outphasing) TX, the conversion from the I/Q data to AM and  $\Phi$  (or  $\Phi 1$ ,  $\Phi 2$ ) is highly nonlinear in nature. Thus, at the output of the CORDIC, the resulting bandwidths of the AM and  $\Phi$  signals will be at least 2× and 5× the bandwidth of the original input signal, respectively, as shown in Fig. 1.6b. Traditionally, this has imposed a major limit on the maximum achievable signal bandwidth that can be handled by an analog-intensive polar TX. Therefore, although a polar TX configuration can normally reach higher power efficiency, it is mostly used for applications with a low to medium signal bandwidth.

The combining of the AM and PM signals is done by the PA itself. Any delay mismatch between these two signals upon arriving at the PA will result in both in-band and out-of-band distortion, which increases the adjacent channel power-ratio (ACPR) and the error-vector magnitude (EVM)<sup>2</sup>. Simulation results show that the EVM and  $\sqrt{ACPR}$ both increase almost linearly (i.e. 6 dB/Octave) by increasing the signal bandwidth or the timing mismatch. Therefore, when the same timing mismatch is normalized to 1/BW, the amount of degradation can be well predicted. This is shown in Fig. 1.6c, where the ACPR and EVM increase ~6 dB by doubling the timing mismatch. In contrast, in a Cartesian TX, the delay mismatch between the two RF signals only increases the EVM with no effect on ACPR (assuming no load-pull effect caused by the interaction between the I and Q signal paths).

#### **1.4.** DIGITAL-INTENSIVE TRANSMITTERS

#### **1.4.1.** DIGITAL CARTESIAN TX

By removing the LPF, and using a bit-wise mixer-and-DAC operation in an arrayed topology, we can make a circuit configuration known as an RFDAC [19–21], which directly (up)converts the input digital signal to an RF signal. In an RFDAC, the mixer is divided into an array of sub-mixers, implemented by simple AND or XOR logic gates. These combined with the DAC unit-cells form an array of sub-RFDACs, which provides us with the desired RFDAC function. By using two RFDAC branches that are driven by 90-degree phase shift and directly combining their outputs, a Cartesian direct digital transmitter (DDTX) can be formed, as shown in Fig. 1.7. Since there is no explicit low-pass filtering, except for the RFDAC's intrinsic zero-order-hold (ZOH) behavior, sampling spectral replicas appear rather strong at the output, especially if the modulation bandwidth is high compared to the RF frequency. A common solution is to push these sampling repli-

<sup>&</sup>lt;sup>2</sup>The definition of ACPR and EVM is explained in Section 1.2



Figure 1.6: (a) Conventional analog-intensive Polar TX, (b) spectrum of AM and PM signals compared to the input I/Q signal, and (c) EVM and ACPR of a 64-QAM signal vs. AM-PM timing mismatch normalized to 1/BW.

cas further out in frequency and attenuate them as much as possible by increasing the RFDAC's sampling rate. Since the linearity constraints on the mixers are very relaxed, an RFDAC-based modulator typically consumes less power than a conventional analog modulator, while being able to deliver more output power.

#### 1.4.2. DIGITAL POLAR TX

Similarly, a polar DDTX can be constructed by using an RFDAC-based power amplifier, known as digital PA, as shown in Fig. 1.8. The phase modulator can be built based on an all-digital phase-locked-loop (ADPLL) or by using two RFDACs in quadrature operation along with a limiter. The design and implementation of a digital polar TX will be explained in more detail in Chapter 2.

In an analog polar TX or a Cartesian TX with a nonlinear PA, the signal bandwidth



Figure 1.7: Digital-intensive Cartesian TX.



Figure 1.8: Digital-intensive polar TX.

is normally limited to 20% of the bandwidth of the TX chain. However, in a DDTX approach, the bandwidth is mostly limited by the maximum up-sampled data rate. In practice, the signal bandwidth is usually limited to less than 20% of the final sampling rate. As analog solutions might require multiple chips and modules, a DDTX can be superior to an analog TX in terms of system integration and efficiency.

#### **1.5.** DESIGN CHALLENGES OF A WIDEBAND EFFICIENT TX

The largest portion of the DC supply power in a typical TX is generally considered to be consumed by the final stage of the TX chain, which is the power amplifier. In gen-



Figure 1.9: Example of two-tone input/output signals of a nonlinear system with a third-order nonlinearity

eral, a PA is most energy-efficient when it is biased in a nonlinear mode of operation. However, a nonlinear system tends not only to generates higher harmonics, but also intermodulation products, which appear inside and around the modulated signal at the RF frequency, thus degrading both the ACPR and EVM. This can be understood simply by calculating the output of a system with a third-order nonlinearity driven with a two-tone signal, which can be described as follows:

$$y(t) = x(t)^{3} = (\cos(\omega_{0}t + \Delta\omega t/2) + \cos(\omega_{0}t - \Delta\omega t/2))^{3}$$

$$\propto 9\cos(\omega_{0}t \pm \Delta\omega/2) + 3\cos(\omega_{0}t \pm 3\Delta\omega/2) + 3\cos(3\omega_{0}t \pm \Delta\omega/2) + \cos(3\omega_{0}t \pm 3\Delta\omega/2)$$
(1.10)

. 2

where the first two terms represent the main signal and the third-order intermodulation products around the carrier, respectively, and the last two terms show the generated products around the third harmonic of the carrier. Assuming a simple compressive third-order model of  $y(t) = x(t) - x(t)^3$  for the PA, the input/output signals in a twotone test are shown in Fig. 1.9. Furthermore, in a nonlinear system with memory effects (e.g. due to the biasing circuit or thermal effects), the impact of a nonlinearity tends to become worse as the signal bandwidth increases. Note that this is an important observation, since the overall data rate is the multiplication of the symbol rate by the number of bits per symbol. Normally, the symbol rate of a TX is limited to the analog signal bandwidth. Thus, in order to increase the spectral efficiency, a higher order of QAM modulation (higher bits-per-symbol) is needed, which in turn requires low signal distortion. Therefore, achieving a higher data rate depends not only on the bandwidth of
the circuitry, but also on the accuracy of the conversion from digital bits to analog signals, hence a low EVM is required. Achieving a high linearity in terms of a low ACPR and EVM with a wideband signal requires either a linear PA design which lowers the system efficiency, or a nonlinear PA linearized by DPD, which can compromise the system efficiency due to the required power consumption of the DPD unit, which becomes more pronounced at lower TX powers.

Therefore, in the view of the author, one of the biggest challenges in modern transmitter (TX) designs, when going from fourth generation (4G) to fifth generation (5G) communication networks, is to handle the increased linearity requirements without the need to compromise the energy-efficiency of the overall TX line-up.

# **1.6.** THESIS OBJECTIVES

Based on the explanations in the previous section, the main objective of this thesis is to design and implement digital power amplifiers for polar tx architectures, which are not only energy-efficient but also highly linear. In the view of this, the first objective is to implement a digital PA in the switch-mode operation to achieve high drain efficiency and utilize innovative circuit-level techniques to fully circumvent the DPD to improve the system efficiency. The innovative DPD-less solution aims for low to medium output power levels.

The second objective is to improve the average efficiency when transmitting modulated signals with a high peak-to-average power ratio (PAPR). For this, the Doherty power combining technique is employed to increase the efficiency at the 6 dB power backoff (PBO) level, utilizing an innovative off-chip transmission line-based matching network to avoid the high passive losses of the on-chip matching networks. The same circuitlevel linearization techniques are applied to the Doherty configuration to achieve high spectral purity and signal accuracy without a DPD or with a low-complexity light DPD.

The third objective is to study the fundamental limitations on the linearity and spectral purity of a digital polar TX. For this purpose, we introduce and investigate a system solution that combines circuit-level linearization techniques with low-complexity DPD techniques requiring only a minimum amount of computational power. This reduces the ACPR and EVM as much as possible and close to their theoretical limits. As a result of this study, an innovative direct-learning DPD is introduced.

# **1.7.** THESIS OUTLINE

This dissertation is organized as follows:

#### DIGITAL POLAR TX AND THE DOHERTY TECHNIQUE BACKGROUND

In Chapter 2, to provide a fundamental understanding of how a digital polar TX actually operates, the different design aspects of the digital polar TX including the phase modulation, amplitude modulation, switch-mode (class-E) power amplifier, digitally controlled power amplifier, and some system-level considerations are described. Furthermore, the Doherty power combining technique, as an efficiency enhancement technique, as well as its analog and digital implementation are briefly explained.

#### **DPD BACKGROUND**

In Chapter 3, different techniques to provide the behavioral modeling of a nonlinear system are explained in order to establish a basic understanding of the nonlinearities of a digital TX and how to correct them. Based on these models, various digital predistortion (DPD) techniques for nonlinear systems with memory as well as some system-level considerations, are introduced. In addition, the equivalent baseband model of passband RF nonlinearity and the baseband model of amplitude-code-word (ACW)-AM and ACW-PM conversions are explained. Furthermore, new basis functions for linearizing a switchmode DPA are introduced, and the theoretical foundations of using undersampling for identifying a nonlinear system are briefly explained.

#### NOVEL INTRINSICALLY LINEAR DIGITAL PA

In Chapter 4, three novel circuit-level linearization techniques namely "nonlinear sizing" for amplitude-code-word (ACW)-to-AM correction, multiphase-RF clocking for ACW-PM correction, and "overdrive-voltage tuning" for process/voltage/temperature (PVT) correction to avoid the need for DPD (or at least substantially reduce the DPD complexity) for a switch-mode polar DPA are described. Furthermore, two AM-PM synchronization techniques are given. In this chapter, the implementation details as well as measurement results of the first ever intrinsically linear polar class-E DPA, implemented in 40nm CMOS, without using any kind of DPD, with both on-chip and off-chip matching networks are presented. The nonlinearity behavior of a class-E DPA is thoroughly analyzed and closed-form equations are given to predict the ACW-AM ACW-PM curves of the DPA.

#### NOVEL INTRINSICALLY LINEAR DIGITAL DOHERTY PA

In Chapter 5, the design and implementation details as well as the measurement results of the first ever intrinsically linear polar class-E Doherty DPA, without employing any kind of DPD, but with an off-chip matching network using a novel transmission linebased Marchand balun with second-harmonic control, are described. Furthermore, the nonlinearity behavior of the Doherty class-E DPA is analyzed and closed-form equations are given to predict the amplitude-code-word ACW-AM and ACW-PM curves. In addition, the system-level considerations of utilizing a digital-intensive Doherty polar DPA are addressed, especially the impact of timing mismatch between the AM and PM paths and the main and peak DPA, which are extensively discussed.

#### NOVEL DPD

In Chapter 6, the system-level considerations as well as the theoretical limits of a polar DPA linearity are given. Considering these limitations, by using an improved offline iterative-learning-control (ILC) DPD algorithm, the measured ACPR and EVM of a single polar DPA (described in Chapter 4) are pushed very close to their minimum theoretical levels. Inspired by the offline ILC DPD, a novel real-time direct-learning DPD is presented. In this DPD approach, in contrast to the conventional direct-learning DPDs, the DPD model parameters are extracted directly by the least-square (LS) algorithm, with similar computational effort. The same DPD algorithm has been applied to the Doherty DPA (described in Chapter 5) and its measurement results are presented.

#### CONCLUSION

Finally, Chapter 7 concludes the main findings of this dissertation and presents suggestions for future developments.

# REFERENCES

- [1] R. W. Burns, Soemmering, Schilling, Cooke and Wheatstone, and the electric telegraph, in Papers Presented at the Sixteenth I.E.E. Week-End Meeting on the History of Electrical Engineering (1988) pp. 70–79.
- [2] R. A. Fessenden, *Wireless Telephony*, Transactions of the American Institute of Electrical Engineers XXVII, 553 (1908).
- [3] M. Guarnieri, *The Age of Vacuum Tubes: Early Devices and the Rise of Radio Communications [Historical]*, IEEE Industrial Electronics Magazine 6, 41 (2012).

- [4] J. Bardeen and W. H. Brattain, *Physical principles involved in transistor action*, The Bell System Technical Journal 28, 239 (1949).
- [5] Cisco Visual Networking Index: Forecast and Trends, 2017–2022, [Online].
- [6] D. C. Daly, L. C. Fujino, and K. C. Smith, *Through the Looking Glass-2020 Edition: Trends in Solid-State Circuits From ISSCC*, IEEE Solid-State Circuits Magazine 12, 8 (2020).
- Y. Shen, M. Mehrpoo, M. Hashemi, M. Polushkin, L. Zhou, M. Acar, R. van Leuken, M. S. Alavi, and L. de Vreede, *A fully-integrated digital-intensive polar Doherty transmitter,* in 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC) (2017) pp. 196–199.
- [8] S. B. Weinstein, *The history of orthogonal frequency-division multiplexing [history of communications]*, IEEE Communications Magazine **47**, 26 (2009).
- C. Campopiano and B. Glazer, A Coherent Digital Amplitude and Phase Modulation Scheme, IRE Transactions on Communications Systems 10, 90 (1962).
- [10] H. Chireix, *High Power Outphasing Modulation*, Proceedings of the Institute of Radio Engineers 23, 1370 (1935).
- [11] D. Cox, *Linear Amplification with Nonlinear Components*, IEEE Transactions on Communications 22, 1942 (1974).
- [12] J. H. Qureshi, M. J. Pelk, M. Marchetti, W. C. E. Neo, J. R. Gajadharsing, M. P. van der Heijden, and L. C. N. de Vreede, A 90-W Peak Power GaN Outphasing Amplifier With Optimum Input Signal Conditioning, IEEE Transactions on Microwave Theory and Techniques 57, 1925 (2009).
- [13] M. P. van der Heijden, M. Acar, J. S. Vromans, and D. A. Calvillo-Cortes, A 19W highefficiency wide-band CMOS-GaN class-E Chireix RF outphasing power amplifier, in 2011 IEEE MTT-S International Microwave Symposium (2011) pp. 1–4.
- [14] M. P. van der Heijden and M. Acar, A radio-frequency reconfigurable CMOS-GaN class-E Chireix power amplifier, in 2014 IEEE MTT-S International Microwave Symposium (IMS2014) (2014) pp. 1–4.
- [15] L. R. Kahn, Single-Sideband Transmission by Envelope Elimination and Restoration, Proceedings of the IRE 40, 803 (1952).

- [16] W. Yuan and J. S. Walling, A multiphase switched capacitor power amplifier in 130nm CMOS, in 2016 IEEE Radio Frequency Integrated Circuits Symposium (RFIC) (2016) pp. 210–213.
- [17] J. E. Volder, *The CORDIC Trigonometric Computing Technique*, IRE Transactions on Electronic Computers EC-8, 330 (1959).
- [18] R. Andraka, A Survey of CORDIC Algorithms for FPGA Based Computers, in Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays, FPGA '98 (Association for Computing Machinery, New York, NY, USA, 1998) p. 191–200.
- [19] S. Luschas, R. Schreier, and H.-S. Lee, *Radio Frequency Digital-to-Analog Converter*, IEEE J. of Solid-State Circuits **39**, 1462 (2004).
- [20] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, and J. R. Long, A Wideband 2 × 13-bit All-Digital I/Q RF-DAC, IEEE Trans. on Microw. Theory Techni. 62, 732 (2014).
- [21] M. Mehrpoo, M. Hashemi, Y. Shen, L. C. N. de Vreede, and M. S. Alavi, *A Wideband LinearI/Q-Interleaving DDRM*, IEEE Journal of Solid-State Circuits **53**, 1361 (2018).

# 2

# TOWARDS THE OPTIMUM DIGITAL POLAR TRANSMITTER

# **2.1.** INTRODUCTION

The linearity and energy efficiency of a transmitter (TX) depend on many factors such as the TX architecture, class of the power amplifier (PA), and the efficiency enhancement technique. As discussed in Chapter 1, a polar architecture is superior to its Cartesian counterpart in terms of power-efficiency. In fact, in a polar TX the PA is driven by a constant-envelope phase-modulated RF signal, allowing it to be designed as a saturated switching PA to achieve high power-efficiency. Therefore, a switch-mode PA is a logical candidate for use in a polar configuration, especially when considering digital-intensive solutions, as it can be direly driven by an RF digital signal. A PA is most power-efficient when reaching its peak output power. By reducing the output power/amplitude, the efficiency of the PA drops. Consequently, when a PA is driven by a modulated signal, the average output power is lower than peak output power, thus reducing the average power-efficiency of the PA. Therefore, an efficiency enhancement technique is necessary to improve the overall energy efficiency of the TX by increasing the PA efficiency at power backoff levels.

In the following discussion, the digital polar architecture using switch-mode class-E operation and Doherty power combining for efficiency enhancement will be briefly explained.

# **2.2.** DIGITAL POLAR TX

An analog polar architecture (see Fig.1.6) is traditionally used for narrow-band communication systems such as GSM, EDGE and Bluetooth. This is mainly because in analogintensive solutions, the amplitude and phase modulators have a limited bandwidth compared to Cartesian modulators. However, the digital polar TX architecture (see Fig. 1.8) has recently gained more attention for use in other wireless applications thanks to the novel RFDAC-based implementations of the phase and amplitude modulators, which can provide higher bandwidth performance.

# 2.2.1. PHASE MODULATION

A closed-loop phase modulator such as a PLL (or the more recent all-digital-PLL (ADLL)) typically cannot handle very wideband signals due the loop bandwidth and VCO (DCO) nonlinearity. Although different techniques such as two-point injection and digital predistortion have been proposed to increase the bandwidth [1], achieving a PM bandwidth of 500 MHz, to support a 100 MHz transmitted signal in a polar configuration is still a big



(b)

Figure 2.1: (a) Current-mode IQ-RFDAC, and (b) RFDAC-based phase modulator with harmonic rejection .

#### challenge for (AD)PLL-based solutions.

To achieve a larger bandwidth, several open-loop techniques have been proposed which generally modulate the phase outside the PLL loop by combining and/or multiplexing several LO signals with different static phases [2]. Using a direct-digital synthesizer (DDS) is a straightforward way to realize a wideband open-loop phase modulator. However, a DDS can consume much more power [3] than a PLL-based phase modulator, which is typically used for lower modulation bandwidths.

#### **CARTESIAN-BASED PHASE MODULATION**

Among the different open-loop concepts, Cartesian-based phase-modulation is one of the most linear and wideband concepts. In digital-RF implementation, an IQ RFDAC (composed of an I-RFDAC and Q-RFDAC) directly up-converts the digital I/Q data input to the RF frequency, as shown in Fig. 2.1a [4]. Each RFDAC consists of an array of current sources with digital bit-wise sub-mixers.

#### CREATING A CONSTANT-ENVELOPE PM SIGNAL

Since only the phase information is needed, a limiter is required to remove the amplitude information to keep the PA's input signal at a maximum voltage swing. The limiter can be implemented either in the analog domain and placed after the IQ modulator, or in the digital domain and placed at the input. An analog limiter by itself can generate a substantial phase error, due to its input-output delay dependency on the level of the input signal. Alternatively, a digital limiter can be used at the input of a Cartesian-based phase modulator. In principle, a digital limiter converts the input  $\phi$  data to  $\cos(\phi)$  and  $\sin(\phi)$  respectively, thus mapping the constellation diagram to the unit circle, and resulting in a constant envelope signal.

#### HARMONIC REJECTION

Another significant source of inaccuracy in such a phase modulator is the presence of higher harmonics at the output of the IQ modulator which after passing through the limiter fold back to the in-band frequency and increase the phase error. In the time-domain, the zero-crossings of the rising edge of output of the RFDAC ( $OUT = I.CK_I - Q.CK_O$ ) is most of the time identical to the zero-crossings of the rising edge of  $CK_{\Omega}$ . Consequently, there is almost no phase modulation observed at the output of the limiter. Therefore, the harmonics of the phase modulator should be filtered out before passing through the limiter, either by using a passive filter, which can limit the RF frequency range, and/or by using a harmonic rejection technique to relax the filter requirements. Harmonic rejection can be achieved by combining the outputs of three or more IQ RFDAC banks with different LO phases in such a way that the fundamentals are combined constructively, while the harmonics are combined destructively. For example, as depicted in Fi. 2.1b, by having three IQ RFDACs with 0°/90°, 45°/135°, and 90°/180°, and scaling the second IO-RFDAC by a factor of  $\sqrt{2}$ , the 3<sup>rd</sup> and 5<sup>th</sup> harmonics at the output are canceled, while the even harmonics can be suppressed by a differential design. This technique suppresses all harmonics up to the 7<sup>th</sup> [5, 6].



Figure 2.2: Concept of amplitude modulation by a digital PA in a polar TX and the resulting spectra, showing the SSRs of the equivalent analog AM and output signals, assuming an ideal sine-wave PM signal.

#### **2.2.2.** AMPLITUDE MODULATION

In an analog polar TX, the AM modulator is normally a high-efficiency DC-DC converter with an LDO. Such a configuration has a much lower bandwidth than the signal modulators in a Cartesian TX, which is typically a quadrature mixer. In addition, due to bandwidth expansion in a polar configuration, the bandwidth of a transmit signal can be only half of the bandwidth of an AM modulator (or only 1/5 of the bandwidth of the phase modulator, whichever is smaller). To tackle this limitation, an RFDAC-based solution for the amplitude modulation can be used here as well. By removing the DAC, LPF and DC-DC converter/LDO from the AM path in an analog polar TX (Fig. 1.6), the digital AM code-word (ACW) can be directly applied to an array of sub-PA cells to control/modulate the output power (Fig. 1.8). The amplitude modulator and the PA are merged into a single block known as a digital PA (DPA), which acts as a direct amplitude modulator. Figure 2.2 illustrates the concept of amplitude modulation with a digital PA as well as the resulting spectra of the equivalent analog AM and output signals, assuming an ideal sine-wave PM signal. Since there is no analog circuit in the path of the AM signal, the AM bandwidth of a digital PA is only limited by the Nyquist frequency, i.e. half of the sampling rate  $F_s$  of the DPA ( $F_s$  can be easily as high as 5 GS/s in nanoscale CMOS technologies). In practice, the digital baseband input signal has a rather low sample-rate; so, before being transmitted, it should be upsampled (and interpolated) as much as possible to push the sampling spectral replicas (SSRs) far away from the center frequency. In this way it can be filtered out by the antenna/matching network. In a polar TX, this can be done either in the Cartesian domain before the CORDIC, or in the polar domain



Figure 2.3: Cartesian upsmapling vs. polar upsampling: (a) the block diagrams, and (b) the resulting output spectra using different interpolation filters (zero-order ZOH, first-order FOH, and second-order SOH).

after the CORDIC, as shown in Fig. 2.3a. The difference in spectral purity and signal accuracy can be considerable depending on the signal bandwidth and interpolation filter, as illustrated in Fig. 2.3b.

With a digital zero-order-hold (ZOH) interpolation filter, ideally there is no difference between the two approaches. However, using a first-order-hold (FOH) filter (i.e. linear interpolator) to further suppress the SSRs, results in a substantial increase in the noise floor if upsampling is done in the polar domain, thus degrading the EVM and ACPR. This is because the new amplitude and phase samples are just an estimation and not the exact polar representation of the input signal as if it was upsampled in the Cartesian domain. Therefore, it is preferred to upsample the input signal before its conversion to polar as much as possible, to maximize the signal accuracy. Note this increases the chance of zero-crossings appearing in the upsampled data, which can cause major difficulties for a polar signal representation. However, due to the constraints on computation speed needed for the CORDIC implementation, one might opt for upsampling after the CORDIC, compromising the in-band signal accuracy for out-of-band spectral purity. In addition, there is the effect of AM and PM SSRs in combination with the presence of the higher harmonics of the RF carrier signal. This will be discussed in more detail in Chapter 5.



Figure 2.4: Conceptual operation comparison of a transconductance and a switch-mode PA at the fundamental frequency, without showing the LC resonators for simplicity: (a) simplified model of transconductance, and (b) switch-mode PA (c) simplified Thevenin (voltage-mode), and (d) Norton (current-mode) model at the fundamental frequency.

# **2.3.** DIGITAL POWER AMPLIFIER

# **2.3.1.** CONCEPT OF THE SWITCH-MODE PA

The most energy-efficient PAs are the switch-mode PAs, which are quite compatible with CMOS technology and digital-intensive solutions as they can be directly driven by square-wave signals, they also facilitate a high integration level. The behavior of a switch-mode PA is similar to a resistive-divider, thus, the output amplitude not only depends on  $R_{ON}$ , but also on  $V_{DD}$ . This contrasts with a transconductance PA (such as class-A, B,...) where the output amplitude is proportional to the  $g_m$  of the PA, and hence independent of  $V_{DD}$  to the first order. Figure 2.4 illustrates this difference in operation conceptually at the fundamental frequency, without showing the inductors and capacitors for simplicity. In Fig. 2.4b, a simple switch-mode amplifier is shown that generates a square wave at the output with a peak-to-peak amplitude of  $V_{DD}R_L/(R_{ON} + R_L)$ . The amplitude of the first harmonic would be  $2V_{DD}R_L/\pi(R_{ON} + R_L)$ . In Fig. 2.4c ( 2.4d), the switch is replaced with a voltage source (or current source) with an amplitude of  $2V_{DD}/\pi$  (or  $2V_{DD}/\pi R_{ON}$ ) to generate the same output voltage as the circuit in Fig. 2.4b at the fun-

damental frequency. As the output amplitude of a switch-mode PA is a function of  $R_{ON}$ , it is possible to control the output amplitude/power by controlling this parameter. This will be explained in more detail in Section 2.3.4.

Among the different classes of switch-mode operation such as: class- $F/F^{-1}$ , which relies on a complicated load network to control higher harmonics (one resonator for each harmonic); and class- $D/D^{-1}$ , which can suffer from the efficiency loss of the output drain capacitance; and class-E, which has one of the simplest load networks while absorbing the output drain capacitance, offering 100% peak drain efficiency in theory [7–13]. In the next section, the class-E power amplifier is briefly introduced.

#### 2.3.2. CLASS-E PA

In Fig. 2.5a, the typical circuit configuration of a class-E PA is shown.  $L_D$  is the DC-feed inductance and  $C_D$  is the drain. Depending on the resonance factor  $q_D = 1/\omega_0 \sqrt{(L_D C_D)}$ , jX is sometimes added to compensate for the phase shift. The series resonator  $(L_0, C_0)$  is tuned at the center frequency  $\omega_0$ , ideally with a very high quality factor to filter out higher harmonics, although doing so will limit the bandwidth as well (this will be addressed in more detail in Chapter 4). The combination of  $L_D$  and  $C_D$  forms a shunt resonator which shapes the drain voltage and current waveforms to a large extent, and hence determines the exact operation class. The design equations for a class-E PA are as given as [14]:

$$R_L = K_P V_{DD}^2 / P_{OUT} \tag{2.1}$$

$$L_D = K_L R_L / \omega_0 \tag{2.2}$$

$$C_D = K_C / (R_L \omega_0) \tag{2.3}$$

$$X = K_X R_L \tag{2.4}$$

Basically, if we assume an ideal switch with an ideal series resonator, for a given load, one simply needs to tune  $L_D$  and  $C_D$  to ensure that drain voltage is zero when the transistor is switched on, thus avoiding any power loss due to the discharging of  $C_D$ . This condition is known as "zero-voltage switching" (ZVS). Different design sets of  $K = \{K_L, K_C, K_P, K_D\}$  can be derived as functions of  $q_D$ . Using the boundary conditions of ZVS, as well as "zero-derivative-voltage switching" (ZdVS) to make the design robust to small variations in  $L_D$  and  $C_D$ , the optimum values of these parameters for maximum efficiency assuming a 50% duty cycle are in plotted Fig. 2.5b [14]. One of the most popular sub-classes of the class-E PA is the parallel-circuit class-E, with the design parameters



Figure 2.5: (a) Class-E PA circuit, (b) calculated optimum values of  $K_L$ ,  $K_C$ ,  $L_P$ ,  $K_D$  vs.  $q_D$  with ZVS and ZdVS, and a 50% duty cycle from  $q_D$ =0.6 to  $q_D$ =1.8 [14], and (c) drain voltage and current for different designs with { $\alpha = 0, q_D = 1.41$ } and { $\alpha = 1, q_D = 1.23$ }.

 $\{K_L = 0.732, K_C = 0.685, K_P = 1.365, K_X = 0\}$  for  $q_D = 1.412$ . In this type of class-E PA, the power scaling factor  $K_P$  is at its maximum, yielding maximum output power for a given  $V_{DD}$  and  $R_L$ . Consequently, it allows the use of a larger resistance RL for a given output power, which can result in a more compact and efficient matching network design in on-chip implementation.

Theoretically, in an ideal class-E PA with zero  $R_{ON}$ , the overlap between the voltage and the current of drain is zero, as plotted in Fig. 2.5c, thus achieving 100% drain efficiency (DE). In practice  $R_{ON}$  is not zero, thus the drain voltage increases slightly when the switch is on, thereby reducing the efficiency. The drain efficiency of the class-E PA can be calculated as follows [15]:

$$P_{SW} = \frac{1}{4\pi} \cdot \frac{V_{SW}^2}{Z_{CD}}$$
(2.5)

$$P_{R_{ON}} = I_{RMS}^2 R_{ON} \tag{2.6}$$

$$P_{DC} = V_{DD}I_{DC} \tag{2.7}$$

$$DE = \frac{P_{OUT}}{P_{DC}} \approx 1 - \frac{P_{SW} + P_{R_{ON}}}{P_{DC}} = 1 - \left(\frac{V_{SW}^2}{4\pi Z_{C_D} V_{DD} I_{DC}} + \frac{I_{RMS}^2 R_{ON}}{V_{DD} I_{DC}}\right)$$
(2.8)

where  $V_{SW}$  is the drain voltage ( $V_{DS}$ ) at the switching time,  $Z_{C_D}$  is the impedance of the drain capacitance,  $I_{RMS}$  is the RMS of the drain current ( $I_{DS}$ ), and  $P_{SW}$ ,  $P_{R_{ON}}$ , and  $P_{DC}$  are the switching power loss due to the discharging of the drain capacitance, the power dissipation of due to the nonzero  $R_{ON}$ , and the input DC power, respectively. With ZVS,  $P_{SW}$  is zero. One of the biggest challenges with the class-E PA is that the peak of the drain voltage can reach above  $3 \times V_{DD}$ , reducing the reliability if nominal  $V_{DD}$  is used for a given device technology. However, with a sub-optimum design, where instead of ZVS we would have  $V_{DS} = \alpha V_{DD}$  at the switching time, it has been shown [16] that by increasing the design parameter  $\alpha$  from 0 to 1, the peak of  $V_{DS}$  and  $I_{DS}$  will reduce from  $3.65 \times V_{DD}$  and  $2.65 \times I_{DC}$  to  $3 \times V_{DD}$  and  $2.3 \times I_{DC}$ , respectively, as shown in Fig. 2.5c. This allows more output power to be obtained by increasing the V<sub>DD</sub> in the same technology, at the cost of a ~3-5 % reduction in the theoretical drain efficiency.

There are other sub-classes of class-E PAs based on different values of  $q_D$ , such as the original RF-choke class-E with  $q_D = 0$  [7, 8], the even-harmonic resonant class-E with  $q_D = 2n$  [17], and the load-insensitive class-E with  $q_D = 1.3$  [18, 19], the latter of which will be discussed next.

#### **2.3.3.** LOAD-INSENSITIVE CLASS-E PA

In general, by varying the load of a class-E PA, despite the fact that ideally we prefer the peak efficiency to stay at its maximum, in practical implementations it may drop significantly depending on the circuit parameters. This is because the load condition of the circuit is changing and the overlap between the drain voltage and current may no longer be zero. In a transconductance PA, this situation is less severe, because as the load decreases (or increases), we can simply increase (or decrease) the amplitude of the input signal to maintain the drain voltage at its maximum, and hence maximum efficiency. However, this cannot be easily accomplished in a class-E PA, as we have no direct control over the overlap between the drain voltage and current waveforms. However, prior



Figure 2.6: Simplified single-ended class-E DPA with the ACW-AM and ACW-PM curves.

studies [18, 19] show that for a design parameter set based on  $q_D = 1.3$ , the class-E PA starts to behave differently. With this specific value ( $q_D = 1.3$ ), the class-E PA responds to the load variations by changing the slope of the drain voltage at the switch-on time, while keeping the overlap between the drain voltage and current almost zero. Therefore, the drain efficiency does not drop that much, hence it is insensitive to the load. This property is especially interesting in designing a Doherty or outphasing PA, which are both based on load modulation.

The class-E PA is not quite a current-mode PA, e.g. the class-A,B,... PAs, nor a voltagemode PA, e.g. the switch-capacitor class-D PA [20]. It is something in between, i.e. a current-source with a limited and modulated  $R_{Out}$ , or a voltage-source with a non-zero and modulated  $R_{Out}$ , which has a big impact on the linearity behavior of the class-E digital PA.

### 2.3.4. CLASS-E DPA ARRAY

A class-E digital power amplifier (DPA) array can be simply made by connecting the drains of multiple transistors, as depicted in Fig. 2.6 [20–31]. Each of these transistors is enabled/disabled individually through a digital logic AND-gate which feeds the input RF PM signal to the gate. The overall array can be seen as a single switching transistor with a programmable width. In principle, the output power is controlled by modulating the  $R_{ON}$  of this switch. At maximum power, all of the transistors are enabled and work in parallel, thus the effective  $R_{ON}$  is at its minimum and DE is at its maximum. By using the simple model in Fig. 2.4, one can observe that the relation between the output amplitude and input ACW is highly nonlinear (assuming  $R_{ON} \propto 1/ACW$ ), resulting in

significant ACW-AM distortion. Furthermore, with a parasitic drain capacitance at the output, it can be seen intuitively that modulating  $R_{ON}$  would modulate the output phase as well, resulting in substantial ACW-PM distortion. Using the Norton equivalent of the switch-mode array, (similar to the model in Fig. 2.4d), an extensive analysis of the linearity of different class-E PA designs including the load/matching networks is presented in Chapters 3 and 4, verified by simulation and measurement results.

#### **2.3.5.** MODULATED EFFICIENCY VS. CW EFFICIENCY

A power amplifier is typically most efficient when the output power is at its maximum (i.e. the drain voltage becomes saturated). By decreasing the input ACW,  $R_{ON}$  increases and DE drops, as shown in Fig. 2.7a. Although it seems trivial to keep the PA at maximum efficiency by using constant envelope signals at the maximum level, this would result in low spectral efficiency. With such a modulation method (e.g. GMSK in the GSM communication standard) only one parameter of the RF signal (i.e. phase) is modulated. Therefore the spectral bandwidth required for transferring the same amount of data is at least a factor of two larger. Consequently, as mentioned above, the amplitude is also modulated to improve the spectral efficiency, such as in QAM or OFDM signals. This results in an instantaneous variation in the output power, hence the average output power becomes much smaller than the peak power. This is characterized as a peak-to-average-power ratio ( $PAPR = AM_{OUT,Max}/AM_{OUT,RMS}$ ). Normally the PAPR is about 6-7 dB for raw QAM signals. As the PAPR increases, the average efficiency decreases. The relation between the average efficiency ( $\overline{DE}$ ) of modulated signals with the efficiency measured for a continuous-wave (CW) signal can be expressed as follows:

$$\overline{DE} = \frac{\overline{P_{OUT}}}{\overline{P_{DC}}} \xrightarrow{generally} \neq \overline{\left(\frac{P_{OUT}}{P_{DC}}\right)}$$
(2.9)

$$\overline{P_{OUT}} = \frac{AM_{OUT}^2}{2R_L} = \frac{AM_{OUT,RMS}^2}{2R_L}$$
(2.10)

where the over-line  $\Box$  denotes the statistical average.  $AM_{OUT,RMS}$  is the statistical RMS of output amplitude (not to be mistaken with the time-domain voltage). In a basic class-A PA the input DC power  $\overline{P_{DC}}$  is constant versus the variation in output power, and the average efficiency is equal to  $\overline{DE} = \frac{AM_{OUT,RMS}^2}{2V_{DD}^2}$ , which has the same form as its CW efficiency  $DE_{CW} = \frac{AM_{OUT}^2}{2V_{DD}^2}$ . Therefore, the class-A average efficiency is equal to the efficiency measured with a CW signal at the power backoff (PBO) level equal to the PAPR. Although this way of estimating the average efficiency is conventional, it is not always



Figure 2.7: (a) CW drain efficiency vs normalized output voltage compared to the average and CW drain efficiencies of ideal class-A and class-B PAs, as well as the probability distribution function (PDF) of a QAM signal, and (b) the DE correction factor  $C_{DE}$  (dB) vs. PAPR (dB).

correct, as depicted in Fig. 2.7a. Such an estimation is accurate only if  $\overline{DE}$  has the same form as  $DE_{CW}$ . For example, in a class-B PA, the DC current is proportional to the output amplitude ( $I_{DC} = \frac{2AM_{OUT}}{\pi R_L}$ ), hence  $\overline{P_{DC}} = \frac{2V_{DD}\overline{AM_{OUT}}}{\pi R_L}$  and  $DE_{CW} = \frac{\pi}{4} \frac{AM_{OUT}}{V_{DD}}$ . In this case,  $\overline{DE} = \frac{\pi}{4} \frac{AM_{OUT,RMS}}{V_{DD}\overline{AM_{OUT}}}$ . Therefore,  $\overline{DE}$  would have the same form as  $DE_{CW}$  only if :  $\overline{AM_{OUT}} = AM_{OUT,RMS}$ . By defining a correction factor as  $C_{DE}(dB) = 20\log(\frac{AM_{OUT,RMS}}{AM_{OUT}})$ , one can find the correct average DE from the  $DE_{CW}$  plot by adding  $C_{DE}(dB)$  to the average power (or subtracting it from the PAPR (dB)). However, as shown in Fig. 2.7b, for modulated signals with a PAPR below 8 dB, this error is less than 1 dB and can be neglected most of the time. Therefore, with a 6 dB PAPR modulated signal, the average efficiency reduces by a factor of 0.25, 0.5, and 0.39 for class-A, class-B, and class-E PAs, compared to their theoretical peak drain efficiencies, which are 50 %, 78.5 %, and 100 %, respectively. Thus, with a 6 dB PAPR signal, the overall average efficiencies for class-A, class-B, and class-E PAs would be 25 %, 39.3 % and 39 %, respectively. This reduction is even more profound in advanced wireless communication standards where carrier-aggregated OFDM signals with a ~15 dB PAPR are used.



Figure 2.8: (a) Conventional symmetrical current-mode Doherty PA, (b) simplified model, and (c) concept of voltage-mode Doherty PA [32].

# **2.4.** DOHERTY EFFICIENCY ENHANCEMENT

There are various techniques to enhance the efficiency at the power backoff (PBO). The most popular ones are envelope tracking and EER, which are based on supply modulation, and also the Doherty and Chireix techniques, which are load modulation techniques. In the envelope tracking technique [13, 33, 34], as the name suggests, the  $V_{DD}$ of the PA tracks the envelope of the RF signal, thus keeping the DC drop over the transistor channel at a minimum. However, it has the same bandwidth limitation as the AM modulator of an analog polar TX. This technique works very well with transconductance PAs; however, it results in extra ACW-AM distortion in class-E PAs as its output amplitude already depends on  $V_{DD}$ . The envelope elimination and restoration (EER) techniques [13, 35] is actually an analog polar configuration, which can work very well with a switch-mode PA such as a class-E PA. The Chireix technique is based on reactance compensation in the power combiner of an outphaisng PA to create load modulation between the two active devices [13, 36, 37]. Since the Chireix technique deals with a complex load, it cannot be made wideband inherently, therefore, it is of less interest when designing wideband transmitters. In addition, it uses higher load modulation ratios at PBO levels than the Doherty concept, which makes it more narrowband and less compatible with CMOS designs because of the limited output impedance of CMOS at high PBO. With EER or Doherty, this is not as challenging as the Chireix. The Doherty technique[1, 6, 13, 32, 38–45] is based on load modulation. The basic idea behind a Doherty PA (comprising two Main and Peak PAs) is as follows: as the input power increases, the drain voltage of the Main PA increases as well until it reaches the saturation point, resulting in maximum drain efficiency. After this transition point, an auxiliary (Peak) PA



Figure 2.9: (a) Drain voltages in an ideal symmetrical current-mode Doherty PA, (b) load modulation seen by the Main and Peak PAs, (c) drain currents, (d) drain efficiencies assuming class-B operation, as well as the PDF of QAM signal.

starts turning on, not only to deliver extra power, but also to keep the Main PA close to voltage saturation by pulling down its load. Thus, the output current/power of the Main PA increases while its drain voltage remains constant. This idea can be implemented in both the current mode, as proposed by Doherty himself in 1936 [38], and the voltage mode [32], as depicted in Fig. 2.8.

Figure 2.8a shows the simplified configuration of a conventional symmetrical Doherty PA. In a symmetrical current-mode Doherty PA using transconductance PAs, the  $g_m$  of the Peak should be twice the  $g_m$  of the Main to ensure that the drain voltage of the Main in maintained at its maximum (Fig.2.9b). The output quarter-wave ( $\lambda/4$ ) transmission line (QWTL) works as an impedance inverter at  $\omega_0$ , and at the input it adjusts the

2



Figure 2.10: (a) Concept of the digital Doherty PA, and (b) simplified class-E digital Doherty PA with the ACW-AM and ACW-PM curves.

phase offset of the Peak signal path. At the beginning, when the input power is small, only the Main PA is active and thus the impedance seen by it is equal to  $4R_L = Z_0^2/R_L$  (Fig.2.9b). Since the Peak PA is off (zero output current), the load that it sees is infinite. Once the input power reaches beyond 6 dB PBO (i.e. output amplitude is more than half of its maximum), the Peak PA eventually turns on and delivers some current to the load (Fig.2.9c). Therefore, as the load seen by the QWTL increases, consequently the load

seen by the Main PA due to the impedance inversion decreases, while the load seen by the Peak PA decreases as well. From the transition point until the Peak PA reaches its saturation point, the voltage swing, and as such the DE of the Main PA, is maintained at its maximum while the DE of the Peak PA increases until eventually it reaches its maximum. Therefore, the very characteristic profile of Doherty drain efficiency is produced, as shown in Fig.2.9d. As a result, the overall average drain efficiency improves significantly, especially if the PBO level at the backoff maximum efficiency closely matches the PAPR of the signal.

#### DIGITAL DOHERTY PA

By modifying the concept illustrated in Fig. 2.8a through the introduction of two arrays of sub-PAs as the Main and Peak PAs, which are digitally controlled, one can create a digital Doherty PA, as shown in Fig. 2.10a. Here the input RF signal is typically a constant envelope phase modulated RF clock. Therefore, the output voltage and current are controlled by enabling/disabling the total number of active sub-PA cells. The Main and Peak DPAs can be realized in a class-E mode of operation, as already discussed in the previous section. Therefore, a class-E digital Doherty PA can be realized as shown in Fig. 2.10b. In such a configuration, by increasing the input ACW, first, all of the unit-cells of the Main PA are activated to reach maximum drain efficiency, then by further increasing the input ACW, the Main code-word  $(ACW_M)$  stops increasing, while the Peak code-word  $(ACW_P)$ starts increasing to eventually activate all of the Peak unit-cells. This is different from the conventional Doherty PA in the sense that, as mentioned above, after the transition point, the input power of the Main transconductance PA still continues increasing, while here the ACW<sub>M</sub> stops increasing <sup>1</sup>. In Chapter 4, the details of the design, implementation and measurement results, as well as the linearity analysis of a class-E digital Doherty PA, are presented.

#### CONCLUSION

In this chapter, the polar TX and digital polar TX architectures in particular are described. Since the phase modulator performance is critical to the polar configuration, a wideband RFDAC-based architecture for the phase modulator is described. Furthermore, it is shown how the amplitude modulator can be merged into the PA to form a digital PA (DPA). The structure and design of class-E PA and consecutively the class-E DPA are

<sup>&</sup>lt;sup>1</sup>In fact as the class-E DPA works like as a resistive divider, further increasing the ACW would simply further decrease  $R_{ON}$ , which at some point (ideally the transition point), has no effect anymore. See Fig. 2.4.

briefly explained. The linearity challenges of the class-E DPA, along with the design details and implementation will be extensively described in Chapter 4.

Furthermore, the relation between the CW efficiency and average efficiency is analyzed in detail, and it is shown that, even though practiced conventionally, there is no exact relation between the two through the PAPR, without applying a correction factor. Moreover, it is shown that the PAPR of the signal causes the PA to deviate from its optimum efficiency. Consequently, efficiency enhancement techniques are introduced such as the Doherty configuration to enhance the efficiency at power back-off levels. Finally, the overall structure of a digital class-E Dohery PA is briefly described, which will be extensively explained in detail in Chapter 5.

#### REFERENCES

- W. Wu, R. B. Staszewski, and J. R. Long, A 56.4-to-63.4 GHz Multi-Rate All-Digital Fractional-N PLL for FMCW Radar Applications in 65 nm CMOS, IEEE Journal of Solid-State Circuits 49, 1081 (2014).
- [2] N. Nidhi, S. Pin-En, and P. Sudhakar, Open-Loop Wide-Bandwidth Phase Modulation Techniques, Journal of Electrical and Computer Engineering (2011), 10.1155/2011/507381.
- [3] A. D. Inc, *Ad9856 cmos 200 mhz quadrature digital upconverter data sheet (rev. c),* (2005).
- [4] Y. Shen, M. Polushkin, M. Mehrpoo, M. Hashemi, E. McCune, M. S. Alavi, and L. C. N. de Vreede, A wideband I/Q RFD AC-based phase modulator, in 2018 IEEE 18th Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF) (2018) pp. 8–11.
- [5] M. Mehrpoo, M. Hashemi, Y. Shen, R. van Leuken, M. S. Alavi, and L. C. N. de Vreede, A wideband linear direct digital RF modulator using harmonic rejection and I/Q-interleaving RF DACs, in 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC) (2017) pp. 188–191.
- [6] Y. Shen, M. Mehrpoo, M. Hashemi, M. Polushkin, L. Zhou, M. Acar, R. van Leuken, M. S. Alavi, and L. de Vreede, A fully-integrated digital-intensive polar Doherty transmitter, in 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC) (2017) pp. 196–199.

- [7] G. D. Ewing, *High-Efficiency Radio-Frequency Power Amplifiers*, Ph.D. thesis, Dept. Elect. Eng., Oregon State University (1964).
- [8] N. O. Sokal and A. D. Sokal, Class E-A New Class of High-Efficiency Tuned Single-Ended Switching Power Amplifiers, IEEE J. of Solid-State Circuits 10, 168 (1975).
- [9] S. A. El-Hamamsy, Design of High-Efficiency RF Class-D Power Amplifier, IEEE Trans. Power Electronics 9, 297 (1994).
- [10] H. Kobayashi, J. Hinrichs, and P. M. Asbeck, *Current Mode Class-D Power Amplifiers* for High Efficiency RF Applications, in 2001 IEEE MTT-S Int. Microw. Symp. Digest (Cat. No.01CH37157), Vol. 2 (2001) pp. 939–942 vol.2.
- [11] F. H. Raab, *Class-E, Class-C, and Class-F Power Amplifiers based upon a Finite Number of Harmonics*, IEEE Trans. on Microw. Theory Techn. **49**, 1462 (2001).
- [12] S. D. Kee, I. Aoki, A. Hajimiri, and D. Rutledge, *The Class-ElF Family of ZVS Switch-ing Amplifiers*, IEEE Trans. on Microw. Theory Techn. **51**, 1677 (2003).
- [13] S. Cripps, *RF Power Amplifiers for Wireless Communications*, Artech House microwave library (Artech House, 2006).
- [14] M. Acar, A. J. Annema, and B. Nauta, *Analytical Design Equations for Class-E Power Amplifiers*, IEEE Trans. on Circuits and Systems I: Regular Papers 54, 2706 (2007).
- [15] S. D. Kee, I. Aoki, A. Hajimiri, and D. Rutledge, *The Class-E/F Family of ZVS Switch-ing Amplifiers*, IEEE Trans. on Microw. Theory Techn. **51**, 1677 (2003).
- [16] M. Acar, A. J. Annema, and B. Nauta, Variable-Voltage Class-E Power Amplifiers, in 2007 IEEE/MTT-S International Microwave Symposium (2007) pp. 1095–1098.
- [17] M. Iwadare, S. Mori, and K. Ikeda, Even harmonic resonant class E tuned power amplifier without RF choke, Electronics and Communications in Japan (Part I: **Communications**) **79**. 23 (1996),https://onlinelibrary.wiley.com/doi/pdf/10.1002/ecja.4410790103.
- [18] M. P. van der Heijden, M. Acar, J. S. Vromans, and D. A. Calvillo-Cortes, A 19W highefficiency wide-band CMOS-GaN class-E Chireix RF outphasing power amplifier, in 2011 IEEE MTT-S International Microwave Symposium (2011) pp. 1–4.

- [19] M. P. van der Heijden and M. Acar, A radio-frequency reconfigurable CMOS-GaN class-E Chireix power amplifier, in 2014 IEEE MTT-S International Microwave Symposium (IMS2014) (2014) pp. 1–4.
- [20] W. Yuan, V. Aparin, J. Dunworth, L. Seward, and J. S. Walling, *A Quadrature Switched Capacitor Power Amplifier*, IEEE J. of Solid-State Circuits **51**, 1200 (2016).
- [21] R. B. Staszewski, J. L. Wallberg, S. Rezeq, C.-M. Hung, O. E. Eliezer, S. K. Vemulapalli, C. Fernando, K. Maggio, R. Staszewski, N. Barton, M.-C. Lee, P. Cruise, M. Entezari, K. Muhammad, and D. Leipold, *All-Digital PLL and Transmitter for Mobile Phones*, IEEE J. of Solid-State Circuits 40, 2469 (2005).
- [22] P. T. M. van Zeijl and M. Collados, A Digital Envelope Modulator for a WLAN OFDM Polar Transmitter in 90 nm CMOS, IEEE J. of Solid-State Circuits 42, 2204 (2007).
- [23] A. Kavousian, D. K. Su, M. Hekmat, A. Shirvani, and B. A. Wooley, A Digitally Modulated Polar CMOS Power Amplifier With a 20-MHz Channel Bandwidth, IEEE J. of Solid-State Circuits 43, 2251 (2008).
- [24] D. Chowdhury, L. Ye, E. Alon, and A. Niknejad, An Efficient Mixed-Signal 2.4-GHz Polar Power Amplifier in 65-nm CMOS Technology, IEEE J. of Solid-State Circuits 46, 1796 (2011).
- [25] L. Ye, J. Chen, L. Kong, E. Alon, and A. Niknejad, Design Considerations for a Direct Digitally Modulated WLAN Transmitter With Integrated Phase Path and Dynamic Impedance Modulation, IEEE J. of Solid-State Circuits 48, 3160 (2013).
- [26] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, and J. R. Long, A Wideband 2 × 13-bit All-Digital I/Q RF-DAC, IEEE Trans. on Microw. Theory Techni. 62, 732 (2014).
- [27] J. Park, Y. Wang, S. Pellerano, C. Hull, and H. Wang, A 24dBm 2-to-4.3GHz Wideband Digital Power Amplifier with Built-In AM-PM Distortion Self-Compensation, in IEEE ISSCC Dig. Tech. Papers (ISSCC) (2017) pp. 230–231.
- [28] M. Hashemi, Y. Shen, M. Mehrpoo, M. Acar, R. van Leuken, M. S. Alavi, and L. de Vreede, An Intrinsically Linear Wideband Digital Polar PA Featuring AM-AM and AM-PM Corrections Through Nonlinear Sizing, Overdrive-Voltage Control, and Multiphase RF Clocking, in IEEE ISSCC Dig. Tech. Papers (ISSCC) (2017) pp. 300–301.

- [29] M. Hashemi, Y. Shen, M. Mehrpoo, M. S. Alavi, and L. C. N. de Vreede, *An Intrin-sically Linear Wideband Polar Digital Power Amplifier*, IEEE Journal of Solid-State Circuits 52, 3312 (2017).
- [30] M. Hashemi, L. Zhou, Y. Shen, M. Mehrpoo, and L. de Vreede, *Highly efficient* and linear class-E CMOS digital power amplifier using a compensated Marchand balun and circuit-level linearization achieving 67% peak DE and 40 dBc ACLR without DPD, in 2017 IEEE MTT-S International Microwave Symposium (IMS) (2017) pp. 2025–2028.
- [31] M. Hashemi, L. Zhou, Y. Shen, and L. C. N. de Vreede, A Highly Linear Wideband Polar Class-E CMOS Digital Doherty Power Amplifier, IEEE Transactions on Microwave Theory and Techniques 67, 4232 (2019).
- [32] V. Vorapipat, C. S. Levy, and P. M. Asbeck, *Voltage Mode Doherty Power Amplifier*, IEEE Journal of Solid-State Circuits **52**, 1295 (2017).
- [33] G. Hanington, P.-F. Chen, P. M. Asbeck, and L. E. Larson, *High-efficiency power amplifier using dynamic power-supply voltage for CDMA applications*, IEEE Transactions on Microwave Theory and Techniques **47**, 1471 (1999).
- [34] J. Staudinger, B. Gilsdorf, D. Newman, G. Norris, G. Sadowniczak, R. Sherman, and T. Quach, *High efficiency CDMA RF power amplifier using dynamic envelope tracking technique*, in 2000 IEEE MTT-S International Microwave Symposium Digest (Cat. No.00CH37017), Vol. 2 (2000) pp. 873–876 vol.2.
- [35] L. R. Kahn, *Single-Sideband Transmission by Envelope Elimination and Restoration*, Proceedings of the IRE **40**, 803 (1952).
- [36] H. Chireix, *High Power Outphasing Modulation*, Proceedings of the Institute of Radio Engineers 23, 1370 (1935).
- [37] J. H. Qureshi, M. J. Pelk, M. Marchetti, W. C. E. Neo, J. R. Gajadharsing, M. P. van der Heijden, and L. C. N. de Vreede, A 90-W Peak Power GaN Outphasing Amplifier With Optimum Input Signal Conditioning, IEEE Transactions on Microwave Theory and Techniques 57, 1925 (2009).
- [38] W. H. Doherty, A New High Efficiency Power Amplifier for Modulated Waves, Proceedings of the Institute of Radio Engineers 24, 1163 (1936).

- [39] F. H. Raab, Efficiency of Doherty RF Power-Amplifier Systems, IEEE Transactions on Broadcasting BC-33, 77 (1987).
- [40] R. J. McMorrow, D. M. Upton, and P. R. Maloney, *The microwave Doherty amplifier*, in 1994 IEEE MTT-S International Microwave Symposium Digest (Cat. No.94CH3389-4) (1994) pp. 1653–1656 vol.3.
- [41] J. H. Qureshi, N. Li, W. C. E. Neo, F. van Rijs, I. Blednov, and L. C. N. de Vreede, A wide-band 20W LMOS Doherty power amplifier, in 2010 IEEE MTT-S International Microwave Symposium (2010) pp. 1504–1507.
- [42] A. Grebennikov and J. Wong, A Dual-Band Parallel Doherty Power Amplifier for Wireless Applications, IEEE Transactions on Microwave Theory and Techniques 60, 3214 (2012).
- [43] N. Ryu, S. Jang, K. C. Lee, and Y. Jeong, CMOS Doherty Amplifier With Variable Balun Transformer and Adaptive Bias Control for Wireless LAN Application, IEEE Journal of Solid-State Circuits 49, 1356 (2014).
- [44] S. Hu, S. Kousai, J. S. Park, O. L. Chlieh, and H. Wang, Design of A Transformer-Based Reconfigurable Digital Polar Doherty Power Amplifier Fully Integrated in Bulk CMOS, IEEE Journal of Solid-State Circuits 50, 1094 (2015).
- [45] A. Cidronali, S. Maddio, N. Giovannelli, and G. Collodi, Frequency Analysis and Multiline Implementation of Compensated Impedance Inverter for Wideband Doherty High-Power Amplifier Design, IEEE Transactions on Microwave Theory and Techniques 64, 1359 (2016).

# 3

# NONLINEAR SYSTEMS AND DIGITAL PREDISTORTION

## **3.1.** INTRODUCTION

A s discussed in previous sections, a conventionally efficient power amplifier is nonlinear, especially if it operates in a class-E PA. The PA's nonlinearity in itself may not be a big issue in a single-channel communication system, because it can be compensated by post-distortion in the receiver (RX) side (except for some signal-to-noise ratio (SNR) degradation). This is especially true for up-link transmission where powerful processing can be done in the base station [1]. Such a system must deal with a varying channel estima- tion, nonetheless. However, since most communication systems are multi-channel and the RX has very strict requirements on the SNR, it is necessary to linearize the PA in advance to avoid any power leakage to adjacent channels which would result in a low ACPR. At the same time a high in-band SNR must also be ensured in favor of a low EVM.

In modern sub-6 GHz communication systems the input data to the TX is digital, which is why digital predistortion (DPD) is the most popular technique to linearize the PA. This is done in such a way that the overall input-output transfer function becomes linear with respect to input power (or amplitude). Since the DPD and the PA are both nonlinear systems, in the following section, first the behavioral modeling techniques are given and then the DPD techniques are discussed.

# **3.2.** Behavioral Modeling of Nonlinear Systems

#### **3.2.1.** VOLTERRA SERIES

The PA and its predistorter are both nonlinear systems, which can be described by mathematical models. A continuous linear time-invariant (LTI) system can be described as:

$$y(t) = H_1[x(t)] = \int h_1(\tau) x(t-\tau) \, d\tau \tag{3.1}$$

where  $H_1[\cdot]$  is the linear system operator,  $h_1(t)$  is the linear kernel (i.e. linear impulse response), x(t) is the input and y(t) is the output. By extending this representation, second- and third-order operators are defined as:

$$H_2[x(t)] = \int_{\mathbb{R}^2} h_2(\tau_1, \tau_2) x(t - \tau_1) x(t - \tau_2) d\tau_1 d\tau_2$$
(3.2)

$$H_3[x(t)] = \int_{\mathbb{R}^3} h_3(\tau_1, \tau_2, \tau_3) x(t - \tau_1) x(t - \tau_2) x(t - \tau_3) d\tau_1 d\tau_2 d\tau_2$$
(3.3)





(b)

Figure 3.1: (a) Continuous-time model of a Volterra series, and (b) discrete-time implementation of a memory polynomial model.

Thus, by adding a series of these nonlinear operators, the Volterra series is formed by:

$$y(t) = H_0[x(t)] + H_1[x(t)] + H_2[x(t)] + \dots + H_K[x(t)] = \sum_{k=0}^{K} H_k[x(t)]$$
  
$$= \sum_{k=0}^{K} \int_{\mathbb{R}^k} h_k(\tau_1, \tau_2, \dots, \tau_k) \prod_{l=1}^k x(t - \tau_l) d\tau_l$$
(3.4)

where  $H_0[x(t)]$  is a constant, and for k = 1, 2, ..., and  $H_k[x(t)]$  is the  $k^{th}$ -order Volterra operator [2, 3]. The Volterra kernels should be causal, i.e.  $h_k(\tau_1, ..., \tau_k) = 0$  for any  $\tau_j < 0$ , j = 1, 2, ..., k. A Volterra series is one the most general models that can approximate

a nonlinear system with high precision if K is large enough and the system converges. Figure 3.1a shows a block diagram of a Volterra series model. Similarly, the discrete-time equivalent of a Volterra series can be defined as follows [4, 5]:

$$y(n) = \sum_{k=1}^{K} \sum_{m_1=0}^{M} \cdots \sum_{m_k=0}^{M} h_k(m_1, \cdots, m_k) \prod_{l=1}^{k} x(n-m_l)$$
(3.5)

where *M* is the memory depth, i.e. the number of previous samples that are considered to contribute to the overall response. This model is linear with respect to its parameters.

In a conventional discrete-time Volterra series, the number of parameters (*P*) increases substantially with the increase of memory depth (*M*) and nonlinearity order (*K*), as  $P = \sum_{k=1}^{K} (m+1)^k$ . For example, while for a memoryless system with a nonlinearity order of 5, *P* equals 5, for a systems with the same nonlinearity order but with a memory depth of 5, *P* increases to 9330, which limits the practical use of the Volterra series. Several techniques have been proposed to tackle this issue, such as dynamic deviation reduction [6, 7] and pruning the Volterra Series [8], yet none show better performance for highly nonlinear systems over other simpler modeling techniques with the same number of parameters.

### **3.2.2.** Equivalent Baseband Model of a Volterra Series

The Volterra series model as formulated in (3.4-3.5) is applicable to real signals (e.g. modulated passband signals), but this says little about the baseband behavior of a nonlinear RF system in this form. In other words, we are mostly interested in modeling the conversion from the input baseband signal to the output passband signal around the fundamental frequency ( $\omega_0$ ). In this case, the input RF signal can be represented as:

$$x(t) = \frac{\widehat{x}(t)e^{j\omega_0 t} + \widehat{x}(t)^* e^{-j\omega_0 t}}{2} = \widehat{I}_x(t)\cos(\omega_0 t) - \widehat{Q}_x(t)\cos(\omega_0 t)$$
(3.6)

where  $\hat{}$  denotes the baseband signal, and \* denotes the conjugate. Therefore, the second power of x(t) is calculated as follows:

$$x(t)^{2} = \frac{2|\hat{x}(t)|^{2} + \hat{x}(t)^{2}e^{j2\omega_{0}t} + \hat{x}(t)^{*2}e^{-j2\omega_{0}t}}{4}$$
(3.7)

where  $|\hat{x}(t)|^2 = \hat{x}(t)\hat{x}(t)^* = \hat{I}_x(t)^2 + \hat{Q}_x(t)^2$  is the amplitude of the baseband input signal. As can be seen, no term is generated around the fundamental frequency  $\omega_0$ . It can be easily shown that none of the even powers of x(t) generates any term at  $\omega_0$ . Therefore, in an equivalent baseband model, the even terms can be removed. Likewise, the third power of x(t) is calculated as:

$$x(t)^{3} = \frac{3\widehat{x}(t)^{2}\widehat{x}(t)^{*}e^{j\omega_{0}t} + 3\widehat{x}(t)\widehat{x}(t)^{*2}e^{-j\omega_{0}t} + e^{j3\omega_{0}t}\widehat{x}(t)^{3} + e^{-j3\omega_{0}t}\widehat{x}(t)^{*3}}{8}$$

$$= \frac{3|\widehat{x}(t)|\widehat{x}(t)e^{j\omega_{0}t} + 3|\widehat{x}(t)|\widehat{x}(t)^{*}e^{-j\omega_{0}t} + e^{j3\omega_{0}t}\widehat{x}(t)^{3} + e^{-j3\omega_{0}t}\widehat{x}(t)^{*3}}{8}$$
(3.8)

Accordingly, the passband signal around the fundamental frequency  $\omega_0$  is given as:

$$\widetilde{x}(t)^{3} = \frac{3\widehat{x}(t)|\widehat{x}(t)|^{2}e^{j\omega_{0}t} + 3\widehat{x}(t)^{*}|\widehat{x}(t)|^{2}e^{-j\omega_{0}t}}{8}$$

$$= \frac{3}{4}Re\left\{\widehat{x}(t)|\widehat{x}(t)|^{2}e^{j\omega_{0}t}\right\}$$
(3.9)

This means that third-order nonlinearity transforms the input baseband signal into an equivalent output baseband signal of  $\hat{x}(t)|\hat{x}(t)|^2$  around  $\omega_0$ . This can be easily extended to any odd  $k^{th}$ -order of nonlinearity, generating an equivalent baseband term of  $\hat{x}(t)|\hat{x}(t)|^{k-1}$ . Therefore, in a system where the input baseband signal x(n) is digital (discrete-time), the odd  $k^{th}$ -order terms with a delay of m would result in an equivalent baseband term of  $x(n-m)|x(n-m)|^{k-1}$ . Using a similar approach, the general form of discrete-time baseband Volterra series can be formulated as:

$$y(n) = \sum_{k=1}^{K} \sum_{m_1=0}^{M} \cdots \sum_{m_{2k-1}=0}^{M} h_{2k-1}(m_1, \cdots, m_{2k-1}) \prod_{l=1}^{k} x(n-m_l) \prod_{l=k+1}^{2k-1} x(n-m_l)^*$$
(3.10)

where  $h_{2k-1}(m_1, \dots, m_{2k-1})$  are complex numbers. As shown here theoretically, the equivalent baseband model of a PA's nonlinearity only includes the odd-order terms. However, in practice by also including the even-order terms in the model, better overall accuracy can be achieved for the same number of parameters. This also allows the use of lower-order terms, which helps with numerical computations [9].

#### **3.2.3.** Memory Polynomial Model

A memory polynomial (MP) model is a special type of Volterra series where only the diagonal terms are kept and all of the cross-terms (i.e. the terms including a product of input at different times/samples) are removed, resulting in a much simpler mathematical model with much fewer parameters. The equivalent baseband model is formulated as:

$$y(n) = \sum_{k=1}^{K} \sum_{m=0}^{M} a_{km} x(n-m) |x(n-m)|^{k-1}$$
(3.11)

where  $a_{km}$  are the model coefficients which are complex numbers in general. This model is also linear with respect to its coefficients. The number of coefficient of (3.11) is equal to  $P = K \times (M + 1)$ . Thus, for a memory depth of 5 and nonlinearity order of 5, the model would need 30 coefficients, which is significantly smaller than its Volterra series counterpart. In Fig. 3.1b, a block diagram of the implementation of the discrete-time MP model is shown [10].

#### **3.2.4.** GENERALIZED MEMORY POLYNOMIAL MODEL

By adding some of the cross-terms from the Volterra series that correspond to some lagging between the input signal and its envelop to the memory polynomial model, the accuracy of the model can be improved by a trade-off with the complexity of the model in terms of number of the coefficients. This modified MP model is known as generalized memory polynomial model (GMP) [4], and is defined as:

$$y(n) = \sum_{k=1}^{K_a - 1} \sum_{m=0}^{M_a - 1} a_{km} x(n-m) |x(n-m)|^{k-1} + \sum_{k=1}^{K_b} \sum_{m=0}^{M_b - 1} \sum_{l=1}^{L_b} b_{kml} x(n-m) |x(n-m-l)|^k + \sum_{k=1}^{K_c} \sum_{m=0}^{M_c - 1} \sum_{l=1}^{L_c} c_{kml} x(n-m) |x(n-m+l)|^k$$
(3.12)

where  $a_{km}$ ,  $b_{kml}$ , and  $c_{kml}$  are the coefficients of the GMP model denoting the aligned terms, lagging, and leading cross-term, respectively.  $K_a$ ,  $K_b$ , and  $K_c$  are the nonlinearity orders.  $M_a$ ,  $M_b$ , and  $M_c$  are the memory depths.  $L_b$ , and  $L_c$  denote the lagging and leading delay length, respectively.

#### **3.2.5.** BASEBAND MODEL OF ACW-AM AND ACW-PM CONVERSION

The baseband input signal can be expressed in a polar representation as  $x(n) = |x(n)|e^{j\phi(n)}$ . Thus, the MP model of (3.11) can be rewritten and rearranged as:

$$y(n) = \sum_{m=0}^{M} \sum_{k=1}^{K} a_{km} |x(n-m)|^{k} e^{j\phi(n-m)}$$
  
=  $e^{j\phi(n)} \sum_{k=1}^{K} a_{k} |x(n)|^{k} + e^{j\phi(n-1)} \sum_{k=1}^{K} a_{k1} |x(n-1)|^{k} + \dots + e^{j\phi(n-M)} \sum_{k=1}^{K} a_{kM} |x(n-M)|^{k}$   
=  $e^{j\phi(n)} R_{0}(|x(n)|) + e^{j\phi(n-1)} R_{1}(|x(n-1)|) + \dots + e^{j\phi(n-M)} R_{M}(|x(n-M)|)$   
(3.13)

where  $R_m(|x(n-m)|) = \sum_{k=1}^{K} a_{km} |x(n-m)|^k$  is a complex function because of the complex  $a_{km}$  coefficients. Therefore, by dividing the above equation by the input phase  $e^{j\phi(n)}$ , the output amplitude  $(AM_y)$  and phase-error  $(\Phi_{y,E})$  as functions of the input amplitude and phase are calculated as:

$$AM_{y}(|x|) = \Big| \sum_{m=0}^{M} R_{m}(|x_{m}|) e^{j(\phi_{m} - \phi_{0})} \Big|$$
(3.14)

$$\Phi_{y,E}(|x|) = \angle \left(\sum_{m=0}^{M} R_m(|x_m|) e^{j\left(\phi_m - \phi_0\right)}\right)$$
(3.15)

where  $|x_m|$  and  $|\phi_m|$  are the input amplitude and phase at the preceding *m* samples, respectively. As can be seen from the above equations, the  $e^{j(\phi_m - \phi_0)}$  term cannot be factored out in general. Therefore, the Volterra-based models suggest, in a nonlinear PA with non-negligible memory effects, the distortion in the output amplitude and phase not only depends on the amplitude of the previous input samples, but also on their phase difference compared to the current sample. Although this seems counter-intuitive and is normally not expected (as the name AM-AM and AM-PM curves suggests, they are normally considered to be functions of the input amplitude only), it can be intuitively understood by recognizing that the amplitude and phase of the summation of two vectors (e.g. signal samples) which depend on both of their amplitudes and phases. Interestingly, by assuming a memoryless (m = 0) or quasi-memoryless [9] (with a relatively narrowband input signal) PA, the static AM-AM ( $R_{AM-AM}$ ) and AM-PM ( $\Phi_{AM-PM}$ ) conversions are calculated as:

$$R_{AM-AM}(|x|) = \Big|\sum_{k=1}^{K} a_k |x|^k \Big|$$
(3.16)

$$\Phi_{AM-PM}(|x|) = \angle \left(\sum_{k=1}^{K} a_k |x|^k\right)$$
(3.17)

which are, as expected, only dependent on the input amplitude.

#### **3.2.6.** GENERAL MATHEMATICAL MODEL

With respect to memory effects, the Volterra series-based discrete-time models only make use of input samples, similar to finite-impulse response (FIR) filters in an LTI system. However, it is possible to include the effect of output samples as well in the model, similar to infinite-impulse response (IIR) filters in an LTI system. Therefore, a more general mathematical model is proposed here that is formulated to also include the output samples, as:

$$y(n) = \sum_{p=0}^{P} \alpha_i \psi_i[\bar{x}(n), \bar{y}(n)]$$
(3.18)

where  $\bar{x}(n) = [x(n), x(n-1), \dots, x(n-M_x)]^T$  and  $\bar{y}(n) = [y(n), y(n-1), \dots, y(n-M_y)]^T$  are the input and output vectors,  $M_x/M_y$  are the memory depth of the input/output signal,  $\psi_p[\cdot]$  is the basis function, *P* is the total number of parameters, and  $\alpha_p$  is the coefficient of each basis function. For example, by setting  $\psi_p[\bar{x}(n), \bar{y}(n)] = x(n-m)|x(n-m)|^{2k}$ (the output is independent of previous the output samples), (3.18) simplifies the model into an MP model, where p = k(m+1) is a unique index corresponding to each basis function with a memory depth of *m* and nonlinearity order of *k*. In general,  $\psi_p[\cdot]$  is not necessarily limited to polynomial functions, and can in fact be any type of function. In [11, 12] special cases of this general form are used to make use of rational functions by defining the output as the ratio of two memory polynomials.

#### **3.2.7.** New Basis Functions Proposals for a Switch-Mode DPA

Here we introduce a new basis function that fits very well with the ACW-AM curve of a class-E DPA. As mentioned in Section 2.3, the output amplitude of a switch-mode amplifier can be modeled as  $y \propto 1/(1 + R_{ON}/R_L)$ . Assuming  $R_{ON} = R_0/x$  and defining  $a_n = R_0/R_L$ , the normalized output can be defined as  $y = (a_n+1)x/(a_n+x)$ , where both x and y vary between 0 to 1. Thus, a set of new basis functions can be introduced to model the ACW/AM-AM conversion curve, as:

$$\psi_{i,AM-AM}[\bar{x}] = \frac{(a_k+1)|x(n-m)|}{a_k+|x(n-m)|} |y(n)| = \sum_{i=0}^{K_w} \alpha_i \psi_{i,AM-AM}[\bar{x}]$$
(3.19)

where  $a_k = \beta / k$ ,  $\beta$  is a fitting parameter, and k is the order of  $\psi_{i,AM-AM}$ . Accordingly, for DPD applications, the inverse of these functions can also be defined as:


Figure 3.2: (a) Different orders of the proposed basis functions  $\psi_{i,AM-AM}$  and  $\psi_{i,AM-AM^{-1}}$ , and (b) an example of modeling a highly nonlinear ACW-AM curve using the MP model (order of 11) and the proposed model (order of 3).

$$\psi_{i,AM-AM^{-1}}[\bar{x}] = \frac{(a_k)|x(n-m)|}{a_k + 1 - |x(n-m)|}$$
(3.20)

Figure 3.2a, plots different orders of these functions. In the example shown in Fig. 3.2b, compared to the MP model with a nonlinearity order of 11 (with 6 kernels), the proposed model with only 3 kernels results in much higher accuracy by a factor of 10 (i.e. 10 dB lower normalized mean-square error (NMSE)).

### **3.2.8.** System Identification by LS Algorithm

Any nonlinear system modeled by a series of basis functions, which is linear with respect to the coefficients of the basis functions, can be identified in terms of its coefficients using the linear least-square (LS) algorithm [13, 14]. Knowing the input data, the LS algorithm is based on fitting a behavioral function to a collection of observed output data by minimizing the sum of the squared residuals. Therefore, this algorithm works better if more data samples are available. For a collection of (N + 1) data samples, the input/output relation of the model can be expressed compactly in a matrix form as:

$$\check{\mathbf{y}} = \mathbf{X}\boldsymbol{\alpha} \tag{3.21}$$

where  $\check{\mathbf{y}}$  is an  $(N+1) \times 1$  vector of estimated output data, and  $\alpha$  is a  $P \times 1$  vector including the *P* coefficients of the basis functions. **X** is an  $(N+1) \times P$  matrix, in which the elements

in each row of *i* are the outputs of each basis functions for an input of x(n - i + 1). The simpler version of the general model of (3.18) with only the input samples as the input of  $\psi$  can be generally expressed in matrix from as:

$$\begin{bmatrix} \check{y}(n) \\ \check{y}(n-1) \\ \vdots \\ \check{y}(n-N) \end{bmatrix} = \begin{bmatrix} \psi_1(x(n)) & \psi_2(x(n)) & \cdots & \psi_P(x(n)) \\ \psi_1(x(n-1)) & \psi_2(x(n-1)) & \cdots & \psi_P(x(n-1)) \\ \vdots & \vdots & \ddots & \vdots \\ \psi_1(x(n-N)) & \psi_2(x(n-N)) & \cdots & \psi_P(x(n-N)) \end{bmatrix} \cdot \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_P \end{bmatrix}$$
(3.22)

Accordingly, the MP model can be expressed in matrix form as:

$$\begin{bmatrix} \check{y}(n) \\ \check{y}(n-1) \\ \vdots \\ \check{y}(n-N) \end{bmatrix} = \begin{bmatrix} x(n) & \cdots & x(n-M) & \cdots & x(n-M)|x(n-M)|^{K-1} \\ x(n-1) & \cdots & \cdots & x(n-1-M)|x(n-1-M)|^{K-1} \\ \vdots & \ddots & \ddots & \vdots \\ x(n-N) & \cdots & \cdots & x(n-N-M)|x(n-N-M)|^{K-1} \end{bmatrix} \cdot \begin{bmatrix} a_{10} \\ a_{11} \\ \vdots \\ a_{KM} \end{bmatrix}$$
(3.23)

To extract the coefficients (i.e. vector  $\alpha$ ), we need the measured output data as well, which can be expressed as  $\mathbf{y} = \begin{bmatrix} y(n) & y(n-1) & \cdots & y(n-N) \end{bmatrix}^{T}$ . Therefore, the estimation error vector is written as:

$$\mathbf{e} = \mathbf{y} - \check{\mathbf{y}} \tag{3.24}$$

With the vector **y** and matrix **X**, it can be shown that the optimum solution to minimize the squared error vector  $||\mathbf{e}||^2$  is given by:

$$\alpha = \left(\mathbf{X}^{\mathbf{H}}\mathbf{X}\right)^{-1}\mathbf{X}^{\mathbf{H}}\mathbf{y}$$
(3.25)

where  $X^H$  is the Hermitian (conjugate) transpose of X. Some variants of the LS algorithm are more suitable for adaptive identification, such as the 'damped' Newton algorithm formulated as [4]:

$$\boldsymbol{\alpha}_{b+1} = \boldsymbol{\alpha}_b + \boldsymbol{\mu} (\mathbf{X}^{\mathbf{H}} \mathbf{X})^{-1} \mathbf{X}^{\mathbf{H}} \mathbf{e}$$
(3.26)

where *b* is the block index and  $\mu$  is the relaxation constant which sets the convergence rate ( $\mu < 1$ ). This introduces some memory into parameter extraction from one block to the next, which makes it robust to measurement errors.

|      | DPD Property |              |              |              |              |              |                           |
|------|--------------|--------------|--------------|--------------|--------------|--------------|---------------------------|
| Туре | Fixed        | Adaptive     | Memoryless   | Memory       | LUT          | Math.        | Ref.                      |
| 1    | $\checkmark$ | -            | $\checkmark$ | -            | $\checkmark$ | _            | [15]                      |
| 2    | $\checkmark$ | -            | $\checkmark$ | -            | -            | $\checkmark$ | [16]                      |
| 3    | $\checkmark$ | -            | -            | $\checkmark$ | $\checkmark$ | _            | -                         |
| 4    | $\checkmark$ | -            | -            | $\checkmark$ | -            | $\checkmark$ | -                         |
| 5    | -            | $\checkmark$ | $\checkmark$ | -            | $\checkmark$ | -            | [17]                      |
| 6    | -            | $\checkmark$ | $\checkmark$ | -            | -            | $\checkmark$ | [18]                      |
| 7    | _            | $\checkmark$ | -            | $\checkmark$ | $\checkmark$ | _            | [ <b>19</b> , <b>20</b> ] |
| 8    | -            | $\checkmark$ | -            | $\checkmark$ | -            | $\checkmark$ | [21, 22]                  |

Table 3.1: Eight different types of DPD.

# **3.3.** DIGITAL PREDISTORTION

A digital predistorter is a nonlinear system which is placed in front of the PA to linearize it, as shown conceptually in Fig. 3.3a. In general, the order of the DPD can be different (higher or lower) from the nonlinearity order of the PA. As an example, shown in Fig. 3.3b, here the memoryless polynomial-nonlinearity order of the PA's is 5, while DPD with at least an order of 9 is used to linearize it. In memory DPD, the correction of each output sample not only depends on the current input sample, but also on the previous samples. One can find various DPD techniques in the literature which can be categorized generally in three groups:

- 1. Fixed or Adaptive DPD
- 2. Memoryless or Memory DPD
- 3. Lookup-Table (LUT) or Mathematical DPD

Based on the above list and the possible combinations, eight different types of DPD can be identified, as summarized in Table 3.1.

# ADAPTIVE DPD

For base station or handheld applications where the output power and/or the signal bandwidth are very high, due to the variations in temperature, antenna load, PA environment, or signal properties, fixed DPD may not be adequate to keep the PA in linear operation. For example, a change in the probability distribution fiction (PDF) or PAPR



Figure 3.3: (a) Basic concept of predistortion, (b) example of the input/output spectrum of a memoryless nonlinear PA (5<sup>th</sup>-order), with and without DPD (9<sup>th</sup>-order).



Figure 3.4: Conventional adaptive DPD.

of the signal can change the DC power consumption, resulting in a different temperature which can alter the nonlinearity parameters of the PA. Therefore, adaptive DPD is normally utilized to update the DPD parameters by sampling and monitoring the output signal and comparing it with the input to estimate the DPD and/or PA model parameters. Such a system can be very complicated, requiring many system- and circuit-level resources, and consume a substantial amount of power because of the typically highspeed ADCs. Figure 3.4 shows a transmitter with conventional adaptive DPD. Although there are some adaptive DPD techniques implemented using lookup-tables [19, 20], in general, mathematical DPD fits better into an adaptive system since only a limited number of coefficients need to be calculated and updated each time instead of completely reprogramming memory.



Figure 3.5: Data DPD vs. signal DPD, (b) the resulting output spectrum using different interpolation filters.

### SIGNAL DPD VS. DATA DPD

As mentioned in Chapter 2, in an RFDAC/DPA-based digital TX, we prefer to increase the sampling rate as much as possible in order to push the sampling spectral replicas (SRRs) far away from the carrier frequency. Thus, the input data should be upsampled as much as possible. As such, the DPD can be placed either before the upsampler (i.e. data DPD), or after the upsampler (i.e. signal DPD) [23]. Similar to the Cartesian/polar upsampling discussion in Chapter 2, since the DPD is a nonlinear block, it is preferred to upsample the input signal before applying predistorion. Otherwise, the upsampler/interpolator generates some signals that are not the accurate inverse of the input signal if upsampled linearly. Therefore, as shown in Fig. 3.5a, by upsampling after DPD, the EVM and ACPR degrade, especially if higher-order interpolation filters are used to further suppress the SSRs.

### LUT-DPD

For most narrowband or low-power applications, fixed (static) memoryless DPD suffices, which can be simply implemented by means of a lookup table. In a (digital) polar TX or conventional analog TX, normally, two one-dimensional (1D) LUT-DPDs with the inverse of ACW-AM and ACW-PM curves are used, resulting in a memory complexity O(x). However, in a direct digital Cartesian TX, because of the interaction between the I and Q RFDACs, two-dimensional (2D) LUT-DPD is needed which results in a higher memory complexity of  $O(x^2)$ . In a digital polar TX, the ACW-AM conversion of the DPA is linearized, thus the contents of the ACW-PM LUT-DPD is programmed based on the normalized output amplitude as the horizontal axis. Since signal DPD requires a high-speed



Figure 3.6: Polyphase LUT-DPD.

configuration, it can be implemented using interleaving/polyphase techniques [24], similar to a polyphase upsampler/interpolator [25]. For example, assuming that only 500 MHz memory is available, by using 4× polyphase memoryless LUTs as shown in Fig. 3.6, the overall sample rate of the LUT-DPD can be increased to 2 GS/s.

### TWO-STAGE HYBRID DPD

Normally, only one type of DPD is used to linearize a transmitter, which is typically a memory mathematical DPD based on the MP or GMP model. However, if there is a discontinuity in the derivative of the transfer function of a nonlinear system, even a very high-order mathematical model cannot accurately estimate its behavior around that point. An example of such a highly nonlinear system is the class-E Doherty DPA, the ACW-AM and ACW-PM curves of which (see Fig. 2.10) have discontinued derivatives at the transition point, which necessitates the use of LUT-DPD as well to reduce the burden on the memory DPD. In Fig. 3.7, two-stage hybrid DPD for a polar TX is depicted, in which the combination of LUT-DPD and the PA can be seen as a statically linear system, the dynamic nonlinearities of which are corrected by the memory mathematical DPD (e.g. MP).

### **3.3.1.** MATHEMATICAL DPD MODEL EXTRACTION

We know that given the input and output signals of a nonlinear system, we can use the linear LS algorithm to estimate its behavioral model. However, to linearize the PA, we need to find the model parameters of the DPD as the pre-inverse of the PA. This is not a



Figure 3.7: Two-stage hybrid DPD (Memory mathematical DPD with LUT) for a polar TX: (a) block diagram, and (b) output spectra.

trivial problem, as initially we do not know exactly what the output of the DPD should be to ensure that the baseband output of the PA will be a scaled copy of the TX input signal. If we knew the ideal output of the DPD for a given input signal, then it would become a classic identification problem which could be solved easily using the linear LS algorithm. In Chapter 6, a novel DPD technique is proposed to address this issue in which the optimum output of the DPD is found, allowing the LS algorithm to be used directly to extract the DPD model parameters. However, conventionally there are two general learning/model-extraction techniques for estimating the DPD parameters, namely *Indirect-learning DPD* and *direct-learning DPD*. In the followings, these two techniques are described briefly.

# **3.3.2.** INDIRECT-LEARNING DPD

Since the ideal output of the DPD as the pre-inverse of the PA is not available, one solution is first to identify the post-inverse of the PA and then copy its model parameters to the DPD. Therefore, the LS algorithm can be used by taking the output of the PA as the input of the post-inverse model, and the input of the PA as the output. Since the DPD model is extracted indirectly, this technique is known as indirect-learning DPD, as depicted in Fig. **3.8a**. However, theoretically speaking, generally nonlinear systems with memory ef- fects cannot be permuted as LTI systems can. In other words, placing the post-inverse of a nonlinear filter in front of it does not guarantee complete linearization [23]. Nevertheless, it has been proved in by Schetzen [26, 27] that the *p*th-order pre-inverse of a Volterra system is identical to its *p*th-order post-inverse. Therefore, the PA can be linearized to the *p*th-order by replacing its *p*th-order post-inverse model with the DPD model. The advantage of this technique is that the problem of finding the DPD pa-



Figure 3.8: (a) Indirect-learning DPD, and (b) direct-learning DPD.

rameters is simplified to finding only the PA's post-inverse model, which is a linear problem (i.e. the model is linear with respect to its coefficients), therefore it can be solved using the linear LS algorithm.

### **3.3.3.** DIRECT-LEARNING DPD

In this structure, the pre-inverse of the PA is estimated directly, as shown conceptually in Fig. 2.8b. However, the combination of the DPD model, the coefficients of which should be identified with the PA's nonlinear function, creates a system model which is no longer linear with respect to the DPD model parameters (i.e. each DPD coefficient at the out- put of the PA model appears in higher orders). Therefore, it becomes a nonlinear optimization problem which cannot be solved directly with the linear LS algorithm. In this case, the nonlinear version of the LS algorithm is used which is based on the Gauss–Newton algorithm [28]. It is formulated similar to the damped Newton algorithm in (3.26), as an iterative adaptive LS algorithm. In this technique, the PA model is typically required, which is estimated separately. For each iteration step of the adaption algorithm, the PA model should be first estimated and updated, then used in the LS estimation of the DPD parameters. It can be shown that the DPD parameters can be estimated iteratively as follows [29, 30]:

$$\alpha_{i+1} = \alpha_i + \frac{\mu}{h_0} \left( \mathbf{X}^{\mathbf{H}} \mathbf{X} \right)^{-1} \mathbf{X}^{\mathbf{H}} \mathbf{e}$$
(3.27)

where *i* is the iteration step,  $\alpha$  is the  $P \times 1$  parameter vector,  $h_0$  is a single complex parameter representing the linear term of the PA model,  $\mu$  is the step size which controls the convergence rate, and  $\mathbf{e} = \mathbf{x} - \mathbf{z}$  is the error vector. As can be seen, indirect-learning DPD is essentially an adaptive DPD which needs to be updated step by step. It should

be noted here that a simple model for the PA is used to simplify the adaption procedure with the drawback of a less accurate adaption gradient. However, in general, the PA model can be a completely nonlinear model, which can increase the complexity of the DPD drastically. Despite the additional computations for estimating the PA model, using the indirect-learning DPD has the advantages of being more robust to feedback path distortions and measurement noise which can incorrectly bias the DPD parameters [23, 29].

### **3.3.4.** SAMPLING RATE REQUIREMENT FOR DPD MODEL EXTRACTION

In an adaptive DPD with either direct or indirect-learning, we need to sample and digitize the output signal using ADCs and compare it with the input signal to calculate the DPD parameters. According to the Nyquist theorem, since the output spectrum of the PA initially has a bandwidth of 5× the TX signal bandwidth, we need an effective sampling rate of 10× the input signal bandwidth to reconstruct the output signal alias-free in the digital domain. For example, for a TX signal bandwidth of 100 MHz, by directly following the Nyquist theorem, we should sample the output with a rate of 1 GS/s, which results in too much power consumption in the ADCs, hence a decrease in the overall system efficiency. Therefore, much effort has been made to reduce the required sampling rate of the ADCs to reduce the overall price and power consumption, such as undersampling and restoration [22], band-limited Volterra-series [31], and spectral extrapolation [32], which may require a significant amount of computational signal processing, resulting in extra cost and power consumption.

The Nyquist theorem has been conventionally seen as the foundation theory of sampling requirements in adaptive DPDs. However, by generalizing the sampling theorem in [33], Zhu has shown that for a memoryless one-to-one mapping nonlinear system, if an inverse system exists such that the cascade of two is a linear system, then the output of a nonlinear system can be reconstructed by sampling at twice the bandwidth of the *input* signal. In [34], Frank has shown that a general Volterra system with memory can also be identified by sampling the input and output at twice the maximum frequency of the *input* signal. The main idea behind these approaches is to reconstruct the output signal in the digital domain and then identify the PA or DPD model.

However, eventually it was recognized that we do not necessarily need to reconstruct the output signal for system identification. We simply need enough informative samples to identify the behavior of the PA. To understand how we can create a model based on the reduced sampling rate (i.e. reduced number of output samples), let us take a closer look at the MP model (3.22) as an example, rearranged as follows:

$$\mathbf{y} = \mathbf{X}\alpha \rightarrow \begin{bmatrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ \vdots \\ y_{N} \end{bmatrix} = \begin{bmatrix} x_{0} & \cdots & x_{M} & x_{0}|x_{0}| & \cdots & x_{M}|x_{M}|^{K-1} \\ x_{1} & \cdots & x_{M+1} & x_{1}|x_{1}| & \cdots & x_{M+1}|x_{M+1}|^{K-1} \\ x_{2} & \cdots & x_{M+2} & x_{2}|x_{2}| & \cdots & x_{M+2}|x_{M+2}|^{K-1} \\ x_{3} & \cdots & x_{M+3} & x_{3}|x_{3}| & \cdots & x_{M+3}|x_{M+3}|^{K-1} \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ x_{N} & \cdots & x_{M+N} & x_{N}|x_{N}| & \cdots & x_{M+N}|x_{M+N}|^{K-1} \end{bmatrix} \cdot \begin{bmatrix} a_{10} \\ \vdots \\ a_{1M} \\ a_{20} \\ \vdots \\ a_{KM} \end{bmatrix}$$
(3.28)

where  $y_i = y(n - i)$  and  $x_i = x(n - i)$ . As a result, there are two ways to reduce the complexity/number of samples in the above model. The first one is to decimate the rows of the vector *y* and matrix *X* by a factor of *D* as shown below:

$$\begin{bmatrix} y_{0} \\ y'_{T} \\ y'_{T} \\ \vdots \\ y_{D} \\ \vdots \\ \end{bmatrix} = \begin{bmatrix} x_{0} & \cdots & x_{M} & x_{0}|x_{0}| & \cdots & x_{M}|x_{M}|^{K-1} \\ y'_{T} & \cdots & y'_{M+T} & x_{H}|x_{T}|^{T} & \cdots & y'_{M+1}|x_{M+1}|^{K-T} \\ y'_{T} & \cdots & y'_{M+T2} & x_{2}|x_{2}|^{T} & \cdots & y'_{M+2}|x_{M+2}|^{K-T} \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ x_{D} & \cdots & x_{M+D} & x_{D}|x_{D}| & \cdots & x_{M+D}|x_{M+D}|^{K-1} \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ \end{bmatrix} . \begin{bmatrix} a_{10} \\ \vdots \\ a_{1M} \\ a_{20} \\ \vdots \\ a_{KM} \end{bmatrix}$$

(3.29)

Consequently, this means that the output signal has been downsampled by a ratio of D in the discrete-time domain, or equivalently undersampled by a factor of D with respect to the sampling rate of the DPD input. Based on this approach, both direct-learning DPD and indirect-learning DPD have been designed and measured with decent performance in [29] and [35], respectively. It should be noted that the rows of the input basis matrix X are created in the high-rate domain, as the input signal x(n) is not downsampled. Thus, by keeping every 1 out of D rows, the decimated version of matrix X is created with the same number of parameters as the high-rate model. Therefore, the accuracy of the model is not compromised, as long as the number of samples is high enough. Therefore, all we need is enough samples from the input and output to compare and extract the DPD parameters. However, how many samples are "enough"? It has been shown that this number should be large enough to ensure that the statistical properties of the sampled data and the the TX signal are the same [35, 36]. This is because, for example, the PA nonlinearity behavior varies with any change in the biasing or temperature, which depends on the PDF profile of the TX signal, especially the PAPR. It is important to note that

in this identification problem, the number of equations is much more than the number of unknown parameters, i.e. we are dealing with an over-determined system of equations. Therefore, we only need enough samples to be distributed over the PDF function to ensure that the PDF of the sampled signal is the same as the PDF of the TX signal. Furthermore, the number of samples should be high enough to reduce the effect of the measurement noise. Thus, instead of a minimum "sampling rate", a minimum "number of samples" is required.

The second approach is to downsample/undersample both the input/output by the same factor of D and then create the model in a low rate. This is equivalent to decimating not only the rows of vector  $\mathbf{y}$  and matrix  $\mathbf{X}$ , but also the columns of the matrix  $\mathbf{X}$  and the rows of parameters vector  $\alpha$ , as has been proposed in [7]. However, decimating the parameters of the model by a factor of D with respect to memory depth is equivalent to undersampling the impulse response of the continuous-time equivalent of the non-linear system. This can compromise the accuracy of the model if the bandwidth of the nonlinear system is not very large (i.e if the memory depth in the time domain is so great that the system settles relatively slowly). Since in this approach the DPD model parameters are already decimated and extracted at a low rate, they should be upsampled before be- ing used in the DPD, which is running at the original high sampling rate. This can be done by zero-padding, i.e. putting D - 1 number of zeros between each two consecutive parameters, with the same nonlinearity order, as follows:

$$\alpha_{\rm HR} = \begin{bmatrix} \alpha_{1,\rm LR} & \alpha_{2,\rm LR} & \cdots & \alpha_{\rm k,\rm LR} \end{bmatrix}$$
(3.30a)

$$\alpha_{\mathbf{k},\mathbf{LR}} = \begin{bmatrix} 0 & 0 & 0 & 0 \\ \alpha_{k0} & 0 & 0 & \alpha_{k1} & 0 & 0 \\ \alpha_{k1} & 0 & 0 & \alpha_{k2} & 0 & 0 & \alpha_{KM} \end{bmatrix}$$
(3.30b)

where  $\alpha_{\text{HR}}/\alpha_{\text{k,LR}}$  are the DPD parameter vectors at a high-rate/low-rate, and  $\alpha_{km}$  is the DPD parameter extracted at a low rate. In practice, this can be implemented by inserting unit delays into the FIR filter of each nonlinear kernel of the system shown in Fig. 3.1b [7].

### **3.3.5.** CHALLENGES OF DPD

In a typical digital polar TX, the input I/Q data are converted to amplitude and phase and then predistorted by two types of independent LUT-DPD. To correct for memory effects, preferably some sort of mathematical DPD is needed to predistort the input signal in the Cartesian domain. At the end of TX chain, the up-converted phase and the amplitude signals are combined by the DPA, which implicitly acts as a multiplier. All of these operations are highly nonlinear and result in extensive bandwidth expansion in the amplitude and phase paths. On the other hand, since a DPA operates as an RF-DAC [37], it requires a very high sampling rate to attenuate and push the spectral replicas away from the carrier frequency. Therefore, when aiming at large video bandwidths, a very high speed DPD with an effective sampling rate up to 10-20× the bandwidth of the input signal is required. In a low-power application, such a high-speed DPD can consume power up to 5-6× the power consumption of the driver stages when transmitting an OFDM signal, for example. For a Cartesian TX, using LUT-DPD is even more complicated since a 2-D LUT is typically required [38–41], whereas a polar TX can use two independent 1-D LUTs [42–48]. Hence, while a DPD might be used to linearize the DPA, it would not be an optimal solution, at least not by itself, especially for low-power applications.

### **3.3.6.** DPD-LESS LINEARIZATION

A fully DPD-based linearization solution can become very complicated in terms of implementation, and therefore too expensive in terms of cost and power consumption. Therefore, it seems completely reasonable to think of circuit-level solutions as a means to reduce the nonlinearity of the DPA as much as possible to a level at which the DPD is no longer needed, or at least the DPD requirements/complexity are relaxed significantly. Considering this, in [49] the bias point of a class-B PA array is adaptively tuned using feedback from an analog AM-replica to eliminate the DPD, while the works in [50, 51] exploit linear structures for the modulator at low output power (~1dBm) at the expense of lower drain efficiency. In [45], the phase is dynamically modified at the input analog PM path by some varactors to correct for the ACW-PM distortion, and in [48] feedforward capacitors are used in each DPA cell to minimize the drain capacitance variations, thereby reducing the ACM-PM distortion. Both of these works still rely on LUT-DPD for ACW-AM correction.

In Chapters 4 and 5, several circuit-level solutions (such as nonlinear sizing, multiphase RF clocking, and overdrive-voltage control) are proposed to linearize class-E DPAs in a polar TX, both in single and Doherty DPA configurations.

# **3.4.** CONCLUSION

In this chapter, different behavioral modeling techniques based on the Volterra series as well as parameter estimation techniques are described. Two truncated versions of the Volterra series, in which all or some of the cross-terms are omitted to reduce the order/complexity of the model, known as MP and GMP models, are described. In addition, to establish the foundation of digital-predistorion in baseband rather than in RF, it is shown how the real-signal passband nonlinearity is translated to complex-signal baseband nonlinearity. Also, the equivalent baseband model of AM-AM and AM-PM conversions are analytically calculated. In addition, new basis functions are proposed to better match the nonlinearity of switch-mode DPAs, hence reducing the order of the nonlinear kernels significantly.

Based on the nonlinear models, different digital predistortion techniques including mathematical and polyphase LUT DPD, adaptive DPD, signal and data DPD, directlearning and indirect-learning DPD are described. Furthermore, the undersampling techniques for DPD model extraction are described.

Finally, it is concluded that although DPD is a very popular solution for linearizing a nonlinear high-power PA, for low-power and/or wideband applications, the overhead of DPD power consumption/cost makes the DPD-less solutions much more attractive.

In Chapters 4 and 5, novel circuit-level linearization techniques are introduced to linearize single and Doherty class-E DPAs, respectively. In addition, in Chapter 6, the system-level considerations to push the linearity limits of a digital polar TX (with a single or Doherty DPA) are explained, and a novel DPD is introduced.

### REFERENCES

- E. Gregorio, J. Cousseau, S. Werner, T. Riihonen, and R. Wichman, *Power amplifier linearization technique with IQ imbalance and crosstalk compensation for broadband MIMO-OFDM transmitters*, EURASIP Journal on Advances in Signal Processing (2011).
- [2] W. Rugh, *Nonlinear System Theory: The Volterra / Wiener Approach* (The Johns Hopkins University Press, 1981).
- [3] M. O. Franz, Volterra and Wiener series, Scholarpedia 6, 11307 (2011), revision #137027.
- [4] D. R. Morgan, Z. Ma, J. Kim, M. G. Zierdt, and J. Pastalan, A Generalized Memory Polynomial Model for Digital Predistortion of RF Power Amplifiers, IEEE Transactions on Signal Processing 54, 3852 (2006).
- [5] F. M. Ghannouchi and O. Hammi, *Behavioral modeling and predistortion*, IEEE Microwave Magazine 10, 52 (2009).

- [6] A. Zhu, J. C. Pedro, and T. J. Brazil, *Dynamic Deviation Reduction-Based Volterra Behavioral Modeling of RF Power Amplifiers*, IEEE Transactions on Microwave Theory and Techniques 54, 4323 (2006).
- [7] A. Zhu, P. J. Draxler, J. J. Yan, T. J. Brazil, D. F. Kimball, and P. M. Asbeck, Open-Loop Digital Predistorter for RF Power Amplifiers Using Dynamic Deviation Reduction-Based Volterra Series, IEEE Transactions on Microwave Theory and Techniques 56, 1524 (2008).
- [8] A. Zhu, J. C. Pedro, and T. R. Cunha, Pruning the Volterra Series for Behavioral Modeling of Power Amplifiers Using Physical Knowledge, IEEE Transactions on Microwave Theory and Techniques 55, 813 (2007).
- [9] L. Ding and G. T. Zhou, Effects of even-order nonlinear terms on power amplifier modeling and predistortion linearization, IEEE Transactions on Vehicular Technology 53, 156 (2004).
- [10] R. Singla and S. Sharma, Digital predistortion of power amplifiers using look-up table method with memory effects for LTE wireless systems, EURASIP Journal on Wireless Communications and Networking (2012).
- [11] M. Rawat, K. Rawat, F. M. Ghannouchi, S. Bhattacharjee, and H. Leung, *General-ized Rational Functions for Reduced-Complexity Behavioral Modeling and Digital Predistortion of Broadband Wireless Transmitters*, IEEE Transactions on Instrumentation and Measurement 63, 485 (2014).
- [12] T. R. Cunha, P. M. Lavrador, E. G. Lima, and J. C. Pedro, *Rational function-based model with memory for power amplifier behavioral modeling*, in 2011 Workshop on Integrated Nonlinear Microwave and Millimetre-Wave Circuits (2011) pp. 1–4.
- [13] A. S. Goldberger, *Econometric theory* (John Wiley, 1964).
- [14] Ordinary Least Squares, available at https://en.wikipedia.org/wiki/ Ordinary\_least\_squares, (accessed: 20.04.2020).
- [15] Y. Shen, M. Mehrpoo, M. Hashemi, M. Polushkin, L. Zhou, M. Acar, R. van Leuken, M. S. Alavi, and L. de Vreede, *A fully-integrated digital-intensive polar Doherty transmitter*, in 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC) (2017) pp. 196–199.

- [16] T. Tango, A. Yamaoka, K. Yamaguchi, and Y. Tanabe, Simplified Temperature Compensation Technique for Digital Predistorter Using Fixed Coefficients, in 2010 IEEE 72nd Vehicular Technology Conference - Fall (2010) pp. 1–5.
- [17] A. N. Lozhkin and M. Nakamura, A new digital predistorter linearizer for wide band signals, in 2011 IEEE 22nd International Symposium on Personal, Indoor and Mobile Radio Communications (2011) pp. 1376–1380.
- [18] A. Farabegoli, B. Sogl, J. Mueller, and R. Weigel, A novel method to perform adaptive memoryless polynomial digital predistortion, in 2013 European Microwave Conference (2013) pp. 404–407.
- [19] P. L. Gilabert, A. Cesari, G. Montoro, E. Bertran, and J. Dilhac, *Multi-Lookup Table FPGA Implementation of an Adaptive Digital Predistorter for Linearizing RF Power Amplifiers With Memory Effects*, IEEE Transactions on Microwave Theory and Techniques 56, 372 (2008).
- [20] Y. Ma, Y. Yamao, Y. Akaiwa, and C. Yu, FPGA Implementation of Adaptive Digital Predistorter With Fast Convergence Rate and Low Complexity for Multi-Channel Transmitters, IEEE Transactions on Microwave Theory and Techniques 61, 3961 (2013).
- [21] H. Qian, H. Huang, and S. Yao, A general adaptive digital predistortion architecture for stand-alone rf power amplifiers, IEEE Transactions on Broadcasting 59, 528 (2013).
- [22] Y. Liu, J. J. Yan, H. Dabag, and P. M. Asbeck, Novel Technique for Wideband Digital Predistortion of Power Amplifiers With an Under-Sampling ADC, IEEE Transactions on Microwave Theory and Techniques 62, 2604 (2014).
- [23] D. Zhou and V. E. DeBrunner, *Novel Adaptive Nonlinear Predistorters Based on the Direct Learning Algorithm*, IEEE Transactions on Signal Processing **55**, 120 (2007).
- [24] Designing Polyphase DPD Solutions with 28-nm FPGAs, available at https://www.intel.com/content/dam/www/programmable/us/en/pdfs/ literature/wp/wp-01171-polyphase-dpd.pdf?wapkw=dpd, (accessed: 20.04.2020).
- [25] A. Oppenheim and R. Schafer, *Discrete-time Signal Processing*, Prentice-Hall signal processing series (Pearson; 3 edition, 2009).

- [26] M. Schetzen, *Theory of pth-order inverses of nonlinear systems*, IEEE Transactions on Circuits and Systems 23, 285 (1976).
- [27] M. Schetzen, *The Volterra and Wiener Theories of Nonlinear Systems* (Krieger Pub., 2006).
- [28] R. Mittelhammer, G. Judge, and D. Miller, *Econometric Foundations Pack with CD-ROM* (Cambridge University Press, 2000).
- [29] L. Ding, F. Mujica, and Z. Yang, Digital predistortion using direct learning with reduced bandwidth feedback, in 2013 IEEE MTT-S International Microwave Symposium Digest (MTT) (2013) pp. 1–3.
- [30] P. L. Gilabert, G. Montoro, T. Wang, M. N. Ruiz, and J. A. García, *Comparison of model order reduction techniques for digital predistortion of power amplifiers*, in 2016 46th European Microwave Conference (EuMC) (2016) pp. 182–185.
- [31] C. Yu, L. Guan, E. Zhu, and A. Zhu, *Band-Limited Volterra Series-Based Digital Predistortion for Wideband RF Power Amplifiers*, IEEE Transactions on Microwave Theory and Techniques **60**, 4198 (2012).
- [32] Y. Ma, Y. Yamao, Y. Akaiwa, and K. Ishibashi, Wideband Digital Predistortion Using Spectral Extrapolation of Band-Limited Feedback Signal, IEEE Transactions on Circuits and Systems I: Regular Papers 61, 2088 (2014).
- [33] Yang-Ming Zhu, Generalized sampling theorem, IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 39, 587 (1992).
- [34] W. A. Frank, Sampling requirements for Volterra system identification, IEEE Signal Processing Letters 3, 266 (1996).
- [35] Z. Wang, L. Guan, and R. Farrell, Compact undersampled digital predistortion for flexible single-chain multi-band RF transmitter, in 2017 IEEE MTT-S International Microwave Symposium (IMS) (2017) pp. 1542–1545.
- [36] Z. Wang, J. Dooley, K. Finnerty, and R. Farrell, Selection of compressed training data for RF power amplifier behavioral modeling, in 2015 10th European Microwave Integrated Circuits Conference (EuMIC) (2015) pp. 53–56.
- [37] S. Luschas, R. Schreier, and H.-S. Lee, *Radio Frequency Digital-to-Analog Converter*, IEEE J. of Solid-State Circuits **39**, 1462 (2004).

- [38] C. Lu, H. Wang, C. Peng, A. Goel, S. Son, P. Liang, A. Niknejad, H. Hwang, and G. Chien, A 24.7dBm All-Digital RF Transmitter for Multimode Broadband Applications in 40nm CMOS, in 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers (2013) pp. 332–333.
- [39] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, and J. R. Long, A Wideband 2 × 13-bit All-Digital I/Q RF-DAC, IEEE Trans. on Microw. Theory Techni. 62, 732 (2014).
- [40] W. Yuan, V. Aparin, J. Dunworth, L. Seward, and J. S. Walling, *A Quadrature Switched Capacitor Power Amplifier*, IEEE J. of Solid-State Circuits **51**, 1200 (2016).
- [41] Z. Deng, E. Lu, E. Rostami, D. Sieh, D. Papadopoulos, B. Huang, R. Chen, H. Wang, W. Hsu, C. Wu, and O. Shanaa, 9.5 A Dual-Band Digital-WiFi 802.11a/b/g/n Transmitter SoC with Digital I/Q Combining and Diamond Profile Mapping for Compact Die Area and Improved Efficiency in 40nm CMOS, in 2016 IEEE International Solid-State Circuits Conference (ISSCC) (2016) pp. 172–173.
- [42] C. D. Presti, F. Carrara, A. Scuderi, P. M. Asbeck, and G. Palmisano, A 25 dBm Digitally Modulated CMOS Power Amplifier for WCDMA/EDGE/OFDM With Adaptive Digital Predistortion and Efficient Power Control, IEEE J. of Solid-State Circuits 44, 1883 (2009).
- [43] D. Chowdhury, L. Ye, E. Alon, and A. Niknejad, An Efficient Mixed-Signal 2.4-GHz Polar Power Amplifier in 65-nm CMOS Technology, IEEE J. of Solid-State Circuits 46, 1796 (2011).
- [44] L. Ye, J. Chen, L. Kong, E. Alon, and A. Niknejad, Design Considerations for a Direct Digitally Modulated WLAN Transmitter With Integrated Phase Path and Dynamic Impedance Modulation, IEEE J. of Solid-State Circuits 48, 3160 (2013).
- [45] J. S. Park, S. Hu, Y. Wang, and H. Wang, A Highly Linear Dual-Band Mixed-Mode Polar Power Amplifier in CMOS with An Ultra-Compact Output Network, IEEE J. of Solid-State Circuits 51, 1756 (2016).
- [46] V. Vorapipat, C. Levy, and P. Asbeck, A Class-G Voltage-Mode Doherty Power Amplifier, in IEEE ISSCC Dig. Tech. Papers (ISSCC) (2017) pp. 46–47.
- [47] D. Cousinard, R. Winoto, H. Li, Y. Fang, A. Ghaffari, A. Olyaei, O. Carnu, P. Godoy,
   A. Wong, X. Zhao, J. Liu, A. Mitra, R. Tsang, and L. Lin, A 0.23mm<sup>2</sup> Digital Power

*Amplifier with Hybrid Time/Amplitude Control Achieving 22.5dBm at 28% PAE for 802.11g, in IEEE ISSCC Dig. Tech. Papers (ISSCC) (2017) pp. 228–229.* 

- [48] J. Park, Y. Wang, S. Pellerano, C. Hull, and H. Wang, A 24dBm 2-to-4.3GHz Wideband Digital Power Amplifier with Built-In AM-PM Distortion Self-Compensation, in IEEE ISSCC Dig. Tech. Papers (ISSCC) (2017) pp. 230–231.
- [49] S. Zheng and H. C. Luong, A WCDMA/WLAN Digital Polar Transmitter With Low-Noise ADPLL, Wideband PM/AM Modulator, and Linearized PA, IEEE J. of Solid-State Circuits 50, 1645 (2015).
- [50] P. E. P. Filho, M. Ingels, P. Wambacq, and J. Craninckx, *An Incremental-Charge-Based Digital Transmitter With Built-in Filtering*, IEEE J. of Solid-State Circuits 50, 3065 (2015).
- [51] A. Ba, Y. H. Liu, J. van den Heuvel, P. Mateman, B. Büsze, J. Dijkhuis, C. Bachmann, G. Dolmans, K. Philips, and H. D. Groot, A 1.3 nJ/b IEEE 802.11ah Fully-Digital Polar Transmitter for IoT Applications, IEEE J. of Solid-State Circuits 51, 3103 (2016).

# 4

# AN INTRINSICALLY LINEAR WIDEBAND POLAR DIGITAL POWER AMPLIFIER

This chapter follows to a great extent the papers [1] and [2], published in JSSC 2017 and IMS 2017, respectively. This might yield a slight overlap with some parts of Chapter 5, but it was not changed dramatically in order to preserve coherence.

# **4.1.** INTRODUCTION

When designing modern wireless digital transmitters (TXs), system integration, output power, energy efficiency, and bandwidth are considered to be the key parameters. These parameters are highly influenced by the linearity and efficiency of the digital power amplifier (DPA) as the final stage. The TX output power is directly controlled by the number of enabled sub-PAs that effectively change the overall width  $(W_{eff})$  of the active devices in the output stage [3–16]. Unfortunately, an energy-efficient DPA, normally implemented in class-E, D, or D<sup>-1</sup> [17–20], is typically highly nonlinear [21]. In a conventional "linearly sized" switch-mode DPA, as shown in Fig. 4.1a, the effective size of the DPA is proportional to the digital amplitude-control-word (ACW), showing significant nonlinearities in its ACW-AM and ACW-PM characteristics. The most widely used approach to correct for these nonlinearities is digital predistortion (DPD), normally implemented as lookup tables (LUT) [3–7, 9–12, 22, 23].

As mentioned in Chapter 3, for wideband signals a (LUT) DPD solution should run at a very high speed, which can consume up to 5-6× the power consumption of the driver stages when transmitting an OFDM signal. To avoid the hardware and speed constraints of a practical DPD, in this chapter, a linear polar DPA with novel linearization techniques to circumvent the DPD is presented. As shown in Fig. 4.1b, by introducing nonlinear sizing of the sub-PA segments, overdrive voltage control, and multiphase RF clocking, the DPA can be linearized without sacrificing the efficiency. To provide a fundamental understanding of the proposed concept, in Section 4.2, the linearity of a class-E DPA is analyzed followed by a description of the proposed linearization techniques in Section 4.3. In Section 4.4.1, the implementation details of the main blocks are described, and the measurement results and the conclusion are presented in Sections 4.5 and 4.6, respectively.

# **4.2.** CLASS-E DPA LINEARITY ANALYSIS

Switch-mode power amplifiers can be driven directly by digital signals. Therefore, they are logical candidates to be used in a digital-intensive TX solution. As mentioned before, a class-E DPA can achieve high efficiency using a simple output matching network [17, 18, 24]. However, it has significant nonlinearity in its ACW-AM and ACW-PM conversions. In the following, the linearity of a class-E DPA is analyzed.



Figure 4.1: (a) Conventional polar DPA with "linear sizing" resulting in ACW-AM and ACW-PM distortion, and (b) proposed polar DPA with "nonlinear sizing", "multiphase RF clocking", and "overdrive voltage control".



Figure 4.2: (a) Conventional single-ended 9-bit class-E DPA, and (b) time domain output waveforms showing ACW-AM and ACW-PM distortion.

# 4.2.1. DC CHARACTERISTIC CURVE AND DYNAMIC LOAD LINES

In a switch-mode DPA, unlike an analog PA, the amplitude of the input RF signal applied to the gate of the transistors is constant for different output power levels while the overall effective width of the switched-on devices varies according to the input ACW. The ratio of the total effective width  $W_{eff}$  to the width of one unit device  $W_0$  is defined as the relative

sizing factor  $K_W = W_{eff}/W_0$ . Therefore, to analyze the DC output characteristics of a class-E DPA as a function of  $K_W$ ,  $I_{DS}$  versus  $V_{DS}$  curves can be plotted for different values of  $K_W$  with a fixed  $V_{GS}$ . In Fig. 4.3a, the  $I_{DS}$  vs.  $V_{DS}$  curves are plotted for a 9-bit DPA with 511 uniform NMOS switches with  $V_{GS} = 1.1V$ . For each curve, two main operation regions can be distinguished: triode (for small  $V_{DS}$ ) and saturation (for large  $V_{DS}$ ). The dynamic load lines for a typical nonideal class-E DPA with  $V_{DD} = 0.5$  are simulated and plotted in Fig. 4.3b. It can be seen that, as  $K_W$  increases, the swing of the drain voltage increases and pushes the operation region from a semi-saturation region toward a triode region. At large values of  $K_W$ , the DPA is fully switched between triode and off-mode regions resulting in the typical class-E drain voltage waveform as shown in Fig. 4.3c. On the other hand, by increasing  $K_W$ , the waveform of drain current changes from a square wave into a typical class-E waveform as shown in Fig. 4.3d. For small values of  $K_W < 30$ , the DPA can be modeled as a digitally controlled switching "current source" DPA, which is a linear DPA. For large values of  $V_{DD}$ , it can be modeled as a "resistive mode" class-E DPA in which its on-resistance is modulated resulting in highly nonlinear behavior. The "linear" range of the DPA can be extended by either increasing  $V_{DD}$  (which raises reliability issues), or by decreasing  $V_{GS}$ . However, in practice, the overall linearity is still highly influenced and degraded by the "nonlinear region".

# **4.2.2.** Analysis of the ACW-AM and ACW-PM Distortion Mecha-NISM

The class-E DPA is implemented in a push-pull configuration with a transformer (TRF)based balun to suppress even harmonics at the output. The principle schematic of the DPA and its lumped model are illustrated in Fig. 4.4a and Fig. 4.4b, respectively. The odd-mode equivalent circuit is shown in Fig. 4.4c. In general, a class-E PA is not a linear time-invariant (LTI) system. Therefore, for theoretical simplicity, we use a Norton equivalent model by replacing the switching transistors with a set of parallel current sources to analyze the dependency of the output amplitude and phase on  $K_W$ . These current sources represent the Fourier series of the drain current as shown in Fig. 4.4d. To model the nonlinearity of the resulting drain current caused by the limited output resistance, a resistor parallel with the current source is included which is inversely proportional to  $K_W$ . Consequently, the output current of the transistors is given by:

$$I_{D} = \sum_{i}^{N} \left( f_{i}(K_{W}) I_{Hi} - I_{RD} \right) = \sum_{i}^{N} \left( f_{i}(K_{W}) I_{Hi} \left( 1 - \frac{K_{W} Z_{IN}}{R_{D0}} \right) \right)$$
(4.1)

where  $f_i(K_W)$  represents the amplitude and phase of the i<sup>th</sup> harmonic as a function of



Figure 4.3: Simulated (a) DC curves of  $I_{DS}$  vs.  $V_{DS}$  for different  $K_W$  with a fixed  $V_{GS} = 1.1$  V, (b) dynamic load lines for a typical class-E DPA with  $V_{DD} = 0.5$ , (c) drain voltage waveforms, and (d) drain current waveforms.

 $K_W$ ,  $I_{Hi}$  is the i<sup>th</sup> harmonic component of the current,  $R_{DO}$  is the output resistance for a unit transistor,  $Z_{IN}$  is the total input impedance seen by the current sources, and  $I_{RD}$ is the nonlinear component of the output current  $I_D$  with regard to  $K_W$ . Since the variation of the transistors' drain capacitance is small, for now, we consider  $C_D$  to be almost constant<sup>1</sup>. Next, we assume that the amplitude of the first harmonic related to the current sources increases proportionally to  $K_W$  (i.e.  $f_1(K_W) = K_W$ ). However, the eventual drain current  $I_D$  will increase nonlinearly due to its decreasing output resistance (unless it sees zero load impedance). By neglecting the higher harmonics and considering only the first harmonic of input current,  $I_{H1}$ , the output voltage is calculated as follows:

 $<sup>{}^{1}</sup>C_{D} = K_{W}(C_{DS0} + C_{GD,Triode,0}) + (K_{TOT} - K_{W})(C_{DS0} + C_{GD,OFF,0}) + C_{PAR} + C_{EXT}$ . The variation of  $C_{D}$  with no  $C_{PAR} + C_{EXT}$  is less than 5% and with large  $C_{PAR} + C_{EXT}$ ; the variation can be less than 1%.



Figure 4.4: (a) Push-pull class-E DPA, (b) its lumped model, (c) its odd-mode half circuit, and (d) its odd-mode half circuit LTI (Norton equivalent) model for simplified theoretical analysis of linearity.

$$V_{OUT} = \frac{(K_W I_{H1} R_{LP}) \times j\omega L_2 R_D}{R_{D0} R_{LP} - \omega^2 \left[ K_W L_1 L_2 + (L_1 + L_2) C_D R_{D0} R_{LP} \right] + j\omega \left[ L_2 R_{D0} + K_W (L_1 + L_2) R_{LP} - \omega^3 L_1 L_2 C_D R_{D0} \right]}$$
(4.2)

where,  $L_1$  and  $L_2$  are the leakage and magnetizing inductances, respectively.  $R_{LP} = 25(K_m/N)^2$  is the load resistance seen from the primary side of the transformer, where  $K_m$  is the magnetic coupling factor of the transformer and N is the turn ratio.  $C_D$  is the total drain capacitance. The first term in the numerator in (4.2) represents the linear gain of the DPA, while the remainder represents the term responsible for amplitude and phase distortion. The above equation can be used for both the linear operation region of DPA, where  $R_{D0}$  is large and for the nonlinear region where  $R_{D0}$  is smaller. In Fig. 4.5, the simulated and calculated  $K_W$ -AM and  $K_W$ -PM curves for a class-E DPA with a total width of 2.5mm are plotted. By assuming  $K_m \approx 1$  in the transformer for theoretical simplicity, the analytical solution of the output amplitude and phase error are calculated as follows:

$$AM(K_W) \approx K_W I_{H1} R_{LP} \left( \frac{\omega q^2 L_2 R_{D0}}{\sqrt{(q^2 R_{D0} R_{LP} - R_{D0} R_{LP})^2 + \omega^2 q^4 (L_2 R_{D0} + K_W L_2 R_{LP})^2}} \right)$$
(4.3)



Figure 4.5: Simulated and calculated (a) ACW-AM conversion curves of a class-E DPA, and (b) ACW-PM conversion curve of a class-E DPA.

$$\phi_{Err}(K_W) \approx 90^\circ - \arctan\left(\frac{\omega q_D^2 L_2(R_{D0} + K_W R_{LP})}{(q_D^2 - 1)R_{D0}R_{LP}}\right)$$
(4.4)

where  $q_D = \frac{1}{\omega \sqrt{L_2 C_D}}$  is the resonance factor. For the nonlinear region ( $K_W > 30$ ),  $R_{D0}$  is assumed to be 1/3  $R_{D0}$  in the linear region. Although, as mentioned earlier, two different modes of operation are distinguished, in both modes, the amplitude and phase distortion is mainly caused by the large variation in the effective output resistance of the transistors. The variation in  $C_D$  is ~ 0.043  $f F/\mu m$  (less than 1%), while it has a negligible effect on the normalized  $K_W$ -AM curve. However, even with a higher variation in 0.1  $f F/\mu m$ , it increases the phase distortion only by one degree, as shown in Fig. 4.5b.

### 4.2.3. POWER AND EFFICIENCY ROLL-OFF

In a switch-mode amplifier, the output power is a function of  $K_W$ , which is given by:

$$P_{OUT} = \frac{|V_{OUT}|^2}{2R_{LP}} = \frac{AM^2(K_W)}{2R_{LP}}$$
(4.5)

where  $AM(K_W)$  is given by (4.3). The DC power consumption of the PA can be calculated as follows:

$$P_{DC} = V_{DD}I_{DC} = P_{DC,Max}P_{NORM}(AM(K_W))$$
(4.6)

where  $P_{DC,Max} = V_{DD}I_{DC,Max} = K_P V_{DD}^2 / R_{LP}$ , in which  $K_P$  is the class-E power scaling factor [25].  $P_{NORM}(AM)$  is a unit-less monotonically increasing function of the output voltage normalized between (0:1). The curvature of  $P_{NORM}(AM)$  depends on the



Figure 4.6: Simulated DE of an ideal class-E/F2, class-E and class-B (D)PA vs. normalized output voltage.

impedance seen by the drain of the transistors at all harmonics, with the first and second being dominant. For simplicity, if we ignore the second harmonic impedance (which in practice terms means that the second harmonic is an open circuit as in class-F<sup>-1</sup> or class-E/F<sub>2</sub> tuning), then  $P_{NORM} \approx \frac{AM(K_W)}{AM_{Max}}$ , so the DC power and drain efficiency are given by:

$$P_{DC} = V_{DD}I_{DC} = \frac{K_P V_{DD}^2}{R_{LP}AM_{Max}}AM(K_W)$$

$$(4.7)$$

$$DE = \frac{P_{OUT}}{P_{DC}} \approx \left(\frac{AM^2(K_W)}{2R_{LP}}\right) / \left(K_P V_{DD}^2 \frac{AM(K_W)}{R_{LP}AM_{Max}}\right) = \frac{AM_{Max}}{2K_P V_{DD}^2} AM(K_W)$$
(4.8)

The simulated drain efficiency (DE) versus the normalized output AM is plotted in Fig. 4.6 for an ideal class-E, class- $E/F_2$  and class-B (D)PA. It can be seen that the DE of a class- $E/F_2$  DPA is almost a linear function of the output amplitude similar to class-B PA and it achieves higher DE at the back-off power compared to a class-E DPA due to its higher second harmonic impedance.

# **4.3.** PROPOSED LINEARIZATION TECHNIQUES

### 4.3.1. NONLINEAR SIZING

As mentioned before, in a conventional DPA, the total effective size of active devices is a linear function of the input ACW, which we refer to as *linear sizing* or *segmentation*. However, according to (4.3) and the simulation results shown in Fig. 4.5a, as the effective size of the DPA linearly increases due to the on-resistance modulation, the output amplitude increases nonlinearly. For simplicity, if we assume  $q_D = 1$  in (4.3) (similar to the



Figure 4.7: (a) Total effective size  $W_{eff}$  ( $\mu$ m) vs. ACW, (b) simulated normalized output AM vs.  $W_{eff}$  ( $\mu$ m), and (c) resulting simulated ACW-AM curves for a DPA with linear sizing, nonlinear sizing, and segmented nonlinear sizing.

class- $D^{-1}$  or  $F^{-1}$ ), we derive a simple equation to describe the amplitude nonlinearity as follows, which is similar to the calculated results in [5, 26]:

$$AM(K_W) = \frac{K_W R_{D0}}{K_W R_{LP} + R_{D0}} R_{LP} I_{H1} = \frac{K_W}{K_W K_{NL} + 1} R_{LP} I_{H1}$$
(4.9)

where  $K_{NL} = R_{LP}/R_{D0}$  is defined as the nonlinearity factor. As  $K_{NL}$  increases (by a lower  $R_{D0}$  or higher  $R_{LP}$ ), which is beneficial for increasing the drain efficiency (DE), the concavity of the ACW-AM curve of a linearly sized DPA increases. In a linearly sized DPA,  $W_{eff,L} = f(ACW) = W_0.ACW$  and thus  $K_W = ACW$  which results in high ACW-AM distortion, as simulated and depicted in Fig. 4.7. However, by nonlinearly sizing the sub-PA cells, i.e., making the sizing factor  $K_W$  a nonlinear function of the ACW, it is possible to linearize the ACW-AM conversion without the need to predistort the ACW data, as shown in Fig. 4.7. Thus, by assuming  $AM(K_W) = G.ACW.R_{LP}I_{H1}$ , where *G* is a constant, we obtain:

$$K_W = \frac{G.ACW}{1 - G.K_{NL}ACW} \tag{4.10}$$

In order to achieve the same total effective size of the DPA as a linearly sized DPA, we should have  $G = 1/(K_{NL}ACW_{Max} + 1)$ . Thus, the total nonlinear effective size ( $W_{eff,NL}$ ) is given by:

$$W_{eff,NL}[ACW] = \frac{W_0.ACW}{1 + K_{NL}(ACW_{Max} - ACW)}$$
(4.11)

From (4.9)-(4.11), we can calculate the extra dynamic range (DR) that we gain by using nonlinear sizing compared to a linearly sized DPA. The output power dynamic



Figure 4.8: Simulated (a) ACW-AM and (b) output PSD of a nonlinearly sized DPA for different numbers of segments assuming no ACW-PM or other type of non-ideality.

range of the DPA can be defined as the ratio between its maximum and minimum usable output power levels, i.e.  $DR(dB) = 10\log\left(\frac{P_{OUT}|_{ACW=Max}}{P_{OUT}|_{ACW=1}}\right)$ . Thus, if we assume that the total size and the resolution of a linearly sized DPA and a nonlinearly sized DPA are the same, consequently, their maximum output power would also be the same. Therefore, by dividing their amplitude at ACW = 1, we obtain:

$$\Delta DR(dB) = 20\log\left(\frac{1 + K_{NL}ACW_{Max}}{1 + K_{NL}}\right) \approx dB(1 + K_{NL}ACW_{Max})$$
(4.12)

As can be seen in Fig. 4.5a, for a typical class-E DPA the output amplitude at ACW = 1 is ~3× the amplitude of a linear DPA. Thus, even by using ideal LUT-DPD to linearize a nonlinear DPA, the DR is still 10-12 dB less than a linear DPA with the same number of bits. Compensating for this loss of DR requires at least two extra bits in the DPA which increases the complexity of the preceding digital circuitries and the DPD block. Furthermore, as  $K_{NL}$  increases, the benefit of nonlinear sizing in the DR with the same resolution also increases.

In a practical design where  $q_D \neq 1$  and  $K_m \neq 1$ , it is easier to extract  $W_{eff}$  by simulating the ACW-AM curve of a linearly sized DPA and then inverting, normalizing, and multiplying it by  $ACW_{Max}W_0$ . However, unlike a linearly sized DPA, where the differentiation of the total effective size is constant and equal to one LSB unit cell size ( $W_0$ ), here, the differentiation of the effective total size given by (4.11) has a different value for each ACW > 0, as follows:

$$W_{Device,NL}[ACW] = \operatorname{diff}(W_{eff,NL}(ACW)) = W_{eff,NL}[ACW] - W_{eff,NL}[ACW-1]$$
(4.13)



Figure 4.9: Concept of overdrive-voltage tuning technique to control the linearity of the ACW-AM curve.

This means that, in order to implement a fully nonlinearly sized *N*-bit DPA, we need  $2^N - 1$  different devices, which that is not only very labor intensive but would also result in high power consumption at the driver stages. In order to benefit from the commonly used binary-unary segmentation to reduce the power consumption of the drivers, *segmented nonlinear sizing* can be devised instead of fully nonlinear sizing. In a segmented nonlinearly sized DPA, the  $W_{eff,NL}(ACW)$  curve is divided into *N* segments in which the effective size  $W_{eff,NL,i}$  of the activated transistors in the i<sup>th</sup> segment increases linearly, resulting in a piecewise-linear approximation of  $W_{eff,NL}$  as shown in Fig. 4.7a. Thus,  $W_{eff,NL,i} = W_i(ACW - P_i)$ , in which  $W_i$  is the unit size of the i<sup>th</sup> segment and  $P_i$  is the sum of the ACW range  $(\Delta P_i)$  of the previous segments, i.e.,  $P_i = \sum_{j=1}^i \Delta P_j$  and  $P_0 = 0$ . By knowing  $W_{eff,NL}(ACW)$  either analytically (from (4.11)) or experimentally, we can calculate  $W_i$  for i > 0 as follows:

$$W_{i} = \frac{W_{eff,NL}[P_{i}] - W_{eff,NL}[P_{i-1}]}{\Delta P_{i}}$$
(4.14)

In order to decrease the complexity of an *N*-bit DPA array, we select the number of segments as a power of two, i.e.,  $N_{Seg} = 2^m$ , and choose the same range for all segments equal to  $\Delta P = 2^{N-m} = 2^n$ . Therefore, the array is implemented in  $2^m$  rows (segments), and they can be fully realized by unary cells, binary cells, or by a combination of both. Segmented nonlinear sizing results in a quasi-linear ACW-AM curve, as simulated and depicted in Fig. 4.8a, whereby the linearity depends on the number and the range of the segments. Assuming  $2^m$  similar range segments and no ACW-PM distortion, the output power spectral density (PSD) of a nonlinearly sized DPA is plotted in Fig. 4.8b for different numbers of segments. As can be seen, in order to create enough margin for other sources of nonidealities, 8 segments are sufficient to reach an acceptable linearity.

### 4.3.2. OVERDRIVE VOLTAGE TUNING FOR PVT COMPENSATION

Accuracy of the nonlinear sizing technique depends on the accuracy of the calculated or simulated  $W_{eff}$ -AM curve of the DPA. However, as predicted by (4.2)-(4.3), in practice, this curve varies with any process/voltage/temperature (PVT) variations, which changes  $R_{D0}$ . In addition, any change in the carrier frequency, the load network, or the antenna impedance that can be modeled as a variation in  $R_{LP}$  results in degradation of the linearity of the ACW-AM curve. For small variations in  $V_{DD}$  (< 10%), the normalized ACW-AM is almost the same. However, for larger variations, the linearity of the ACW-AM curve degrades since the ranges of the current-source mode and resistive-mode regions change. Nonetheless, any PVT/frequency/load variations can generally be modeled as a variation in the nonlinearity factor  $K_{NL}$  which can be corrected by tuning  $R_{D0}$ . Therefore, as predicted by the following simplified equation, we can linearize the normalized ACM-AM curve again by correcting  $K_{NL}$ :

$$AM_{NORM}(W_{eff}) \cong \frac{W_{eff}}{W_{eff,Max}} \left( \frac{W_{eff,Max}K_{NL} + W_0}{W_{eff}K_{NL} + W_0} \right)$$
(4.15)

Assuming  $V_{DD} < V_{OD}$ , where  $V_{OD} = V_{GS} - V_{TH}$  is the overdrive voltage of the DPA output transistors,  $R_{D0}$  is given by  $R_{D0} = (\frac{W}{L} \times K_n \times V_{OD})^{-1}$ [27]. Thus, in order to tune  $R_{D0}$  and correct  $K_{NL}$ , the overdrive voltage can be tuned by changing the amplitude of the RF clock applied to the gate of the transistors (i.e.  $V_{GS}$ ). This is feasible by tuning the DC supply of the buffers driving the transistors. In Fig. 4.9, the concept of the overdrive tuning technique to control the linearity of the ACW-AM curve by making it more concave (increasing overdrive voltage) or convex (decreasing overdrive voltage) is shown. For example, if the DPA is designed by nonlinear sizing to be linear in an ambient temperature of  $T_0$  and the TT process corner, but the chip is fabricated in the FF process corner or during the chip operation, the temperature is less than  $T_0$ , causing  $R_{D0}$  to decrease. Therefore, to correct  $K_{NL}$ , the  $V_{DD}$  of the buffers driving the output transistor should decrease in order to lower the  $V_{OD}$  and increase  $R_{D0}$ . In Fig. 4.10a, the simulation results of such a scenario for the process variation from the TT corner to the FF process corner is depicted showing a less than 0.1dB decrease in the output power. In Fig. 4.10b, the simulated effect of the temperature variation is shown which indicates a negligible impact on the linearity.



Figure 4.10: Simulated (a) ACW-AM curves of a scenario showing how to correct for the process variation from the TT to the FF corner by controlling the overdrive voltage, and (b) the effect of temperature variation on normalized ACW-AM and ACW-PM conversion curves.



Figure 4.11: (a) Basic concept of multiphase RF clocking, and (b) the resulting simulated phase distortions of a DPA with conventional single-phase RF clocking and multiphase RF clocking.

# **4.3.3.** MULTIPHASE RF CLOCKING

In a conventional DPA, all of the transistors are driven by the same modulated RF clock in which the phase is dynamically modified either digitally (by DPD) [3–7, 10, 11] or in the analog PM path [9] to correct for the ACW-PM distortion. This phase distortion is translated into an ACW-dependent delay in the time domain as shown in Fig. 4.2b. In order to avoid modifying the phase information for each ACW, in this work, multiple RF clocks with different but fixed delay offsets are applied to the DPA cells. In Fig. 4.11a, the basic concept of *multiphase RF clocking* is shown. The DPA array is divided into a few



Figure 4.12: (a) Simplified LTI model of multiphase RF clocking, phasor representation of the output signal and the currents of each segment for (b) a conventional DPA, (c) a DPA with multiphase RF clocking requiring positive phase offsets, and (d) a DPA with multiphase RF clocking with negative phase offsets implementable by positive delay offsets.

segments (which are typically but not necessarily the same segments used for segmented nonlinear sizing). Each segment is driven by an RF clock with a delay offset different from other segments. In Fig. 4.11b, the resulting simulated phase distortions of a DPA with conventional single-phase RF clocking and multiphase RF clocking are depicted. By knowing the phase offsets of each segment  $(\Delta \theta_i)$  at a carrier frequency of  $f_C$ , the delay offsets of that segment is calculated by  $\Delta T_i = \Delta \theta_i / (180^\circ \times f_C)$ .

Equation (4.2) can be rewritten for the output voltage as a product of transistors cur-

rent, modeled as current sources, and the trans-impedance seen by that current source including the output resistance of the transistors as follows:

$$V_{OUT}(K_W) = I_{H1} \times K_W \times \{|Z(K_W)| \angle \Phi_Z(K_W)\}$$

$$(4.16)$$

where,  $|Z(K_W)|$  is the absolute value of the trans-impedance function and  $\angle \Phi_Z(K_W)$  is its phase response as functions of the sizing factor of  $K_W$ . In a simplified LTI (Norton equivalent) model for multiphase RF clocking with *N* segments, we can replace each segment with a current source at the fundamental frequency with a phase offsets ( $\Delta \theta_i$ ) and an amplitude proportional to the sizing factor of that segment ( $K_{Wi}$ ), as shown in Fig. 4.12a. Thus, by using the superposition theorem, the output voltage is given by:

$$V_{OUT}(K_W) = \{ \sum_{i=1}^N |I_{H1}| \angle \Delta \theta_i . K_{Wi} \} [|Z(K_W)| \angle \Phi_Z(K_W)] = \sum_{i=1}^N S_i(K_W)$$
(4.17)

where  $K_W = \sum_{i=1}^N K_{Wi}$ , and  $S_i(K_W) = |I_{H1}| \times K_{Wi} \times |Z(K_W)| \angle [\Delta \theta_i + \Phi_Z(K_W)]$  is the output voltage phasor contributed by the i<sup>th</sup> segment as depicted in Fig. 4.12(b)-(d). The value of  $K_{Wi}$  represents the effective size of the enabled transistors in the i<sup>th</sup> segment normalized to the unit size  $W_0$ . Therefore, when Segment 1 is fully switched on, the output phasor is equal to  $S_1(K_{W1})$  whereby the phase is equal to  $\Delta \theta_1 + \Phi_Z(K_{W1})$ . When Segment 2 is also fully switched on, the output phasor is equal to  $S_{1-2} = |S_1(K_{W1-2})| \angle [\Delta \theta_2 + \Phi_Z(K_{W1-2})]$ , where  $K_{W1-2} = K_{W1} + K_{W2}$ . So, it is calculated as:

$$\Delta\theta_{1-2} = \Delta\theta_2 + \Phi_Z(K_{W1-2}) + \arctan\left(\frac{|S_1(K_{W1-2})|.\sin(\Delta\theta_1 - \Delta\theta_2)}{|S_1(K_{W1-2})|.\cos(\Delta\theta_1 - \Delta\theta_2) + |S_2(K_{W1-2})|}\right) \quad (4.18)$$

Hence, in order to rectify the phase distortion, by equating  $\Delta \theta_{1-2}$  to  $\Delta \theta_1 + \Phi_Z(K_{W1})$ ,  $\Delta \theta_2$  is obtained as follows:

$$\Delta \theta_2 = \Delta \theta_1 + \arcsin\left(\frac{|S_1(K_{W1-2})|}{|S_2(K_{W1-2})|}\sin[\Phi_Z(K_{W1}) - \Phi_Z(K_{W1-2})]\right) + \Phi_Z(K_{W1}) - \Phi_Z(K_{W1-2})$$
(4.19)

In general, if  $K_{W1-i} \stackrel{\text{def}}{=} \Sigma_{j=1}^{i} K_{Wj}$  and  $S_{1-i}(K_{W1-i}) \stackrel{\text{def}}{=} \Sigma_{j=1}^{i} S_i(K_{W1-i})$ , by knowing  $\Delta \theta_1$  to  $\Delta \theta_{(i-1)}$ , and applying the same procedure, the phase offsets  $\Delta \theta_i$  is calculated as follows:

$$\Delta \theta_{i} = \Delta \theta_{1} + \arcsin\left(\frac{|S_{1-(i-1)}(K_{W1-i})|}{|S_{i}(K_{W1-i})|} \sin[\Phi_{Z}(K_{W1}) - \Phi_{S1-(i-1)} + \Delta \theta_{1}]\right) + \Phi_{Z}(K_{W1}) - \Phi_{Z}(K_{W1-i})$$
(4.20)



Figure 4.13: (a) Flowchart of delay offset optimization for ACW-PM correction, and (b) the simulated effect of multiphase RF clocking on ACW-AM conversion.

where  $\Phi_{S,1-(i-1)} \stackrel{\text{def}}{=} \angle S_{1-(i-1)}(K_{W1-i})$ . Since phase offsets calculated by (4.20) are positive, as shown in Fig. 4.12c, they are not suited for implementation by delaying the RF clocks as they are equivalent to negative delay offsets. In order to make the phase offsets implementable by delay lines, the largest phase offsets should be less than zero. Thus, for a class-E or semi class-E/F<sub>2</sub>, by having  $\Delta \theta_N = 0$ , as shown in Fig. 4.12d, the phase offsets of the first RF clock is given by:

$$\Delta\theta_{1} = -\Phi_{Z}(K_{W1}) + \arctan\left(\frac{S_{N}(K_{W1-N}).\sin(\Phi_{Z}(K_{W1-N})) + |S_{1-(N-1)}(K_{W1-N})|.\sin(\Phi_{S1-(N-1)})}{S_{N}(K_{W1-N}).\cos(\Phi_{Z}(K_{W1-N})) + |S_{1-(N-1)}(K_{W1-N})|.\cos(\Phi_{S1-(N-1)})}\right)$$
(4.21)

With  $\Delta\theta_1$  from the above equation, the other phase offsets can be calculated from (4.20). For a DPA with *N*-phases RF clocks, N-1 steps are required to estimate all of the phase/delay offsets. In practice, the delay offsets can be found by using an iterative algorithm as shown in Fig. 4.13a. In this algorithm, in each iteration of the outer loop, the phase errors of segments 1 to (N-1) are measured with respect to the phase of segment *N*, converted into delay codes, and then programmed into the chip. This loop typically reiterates four to five times until the root-mean-square (RMS) of the measured phase errors is less than 1°. Once the ACW-PM is flattened, the normalized ACW-AM curve is almost the same as a single phase nonlinearly sized DPA. In Fig. 4.13b, the simulated effect of multiphase RF clocking on the ACW-AM curve is shown. Furthermore,



Figure 4.14: Capacitive harmonic tuning for efficiency enhancement (a) circuit, and (b) power and efficiency simulation vs. duty cycle.

by using this technique, due to the intrinsic weighted phase averaging at the output, the total phase error inside each segment is significantly reduced. For example, as shown in Fig. 4.11b, the total phase errors of Segment 3 is reduced from 10° to 2° by employing multiphase RF clocking.

# **4.3.4.** HARMONIC TUNING FOR EFFICIENCY ENHANCEMENT

In a typical class-E DPA, multiphase RF clocking does not degrade the average drain efficiency (DE) (with a peak-to-average power ratio (PAPR) > 6 dB). However, depending on the load network conditions, it may slightly degrade the peak DE. By using a capacitor ( $C_C$ ) between the differential drains of the push-pull DPA, as shown in Fig. 4.14a, the impedance of odd and even modes can be tuned from a typical class-E PA more toward a class-E/F<sub>2</sub> condition[24]. By doing so, the peak DE is enhanced such that it returns to the level of a single phase class-E DPA. However, the sensitivity of the power and efficiency to duty cycle variations (and timing mismatches) may increase. In this work, by properly optimizing  $C_C$ , not only is the peak DE enhanced but the sensitivity of the peak P<sub>OUT</sub>, and the DE to the variations in the duty cycle (and timing mismatches) are also improved compared to a single-phase class-E DPA as shown by the simulation results in Fig. 4.14b.

# 4.4. IMPLEMENTATION

### 4.4.1. CLASS-E DPA WITH ON-CHIP MATCHING NETWORK

The 9-bit linear polar DPA is designed and fabricated in 40 nm bulk CMOS with a core area of 1 mm×0.45 mm. The overall block diagram of the proposed DPA, the related circuit of the sub-PA cells, and the single-ended to differential converters for the RF clocks are shown Fig. 4.15, and the chip micrograph is depicted in Fig. 4.16. The DPA consists of two identical push-pull arrays which are configured in an 8-row × (16+3) column pattern. Clock gating is applied to the row drivers to enhance the efficiency at power back-off levels. Each row contains one segment of an 8-segment nonlinearly sized DPA. Similar to a typical segmented digital-to-analog (DAC) converter [28], each segment consists of 16 MSB cells which are addressed by the first four most significant bits of the column decoder and 3 LSB cells which are addressed by the two least significant bits of the column decoder. In each segment, the size of the MSB cells is 1/16 of the total size of that segment, while the size of the LSB cells is 1/64 of the total size of the output transistor.

In order to facilitate the overdrive tuning technique, a 6-bit digitally programmable low-dropout (LDO) regulator is designed and implemented on-chip. It is capable of driving 50 mA with a resolution of 10-12 mV and a settling time of ~300 ns as illustrated in Fig. 4.17a. The reference voltage of the LDO is provided by a 6-bit R-2R DAC. There is only one LDO regulator on the chip that supplies all the drivers in the entire push-pull DPA array, although it does not supply the delay offset blocks. The supply voltage tuning can change the delay of the drivers. However, for example, with a 10 mV variation in the VDD of the buffers (coming from LDO), the delay of the smallest buffer chain (driving the smallest output transistor) changes about 3 ps and the delay of the largest buffer chain (driving the largest output transistor) changes about 3.2 ps. As a result, the change in the delay of the buffers is almost the same for the entire DPA. This change in the delay manifests itself as a phase offset at the output, which is not an issue as long as it does not change during the transmission of a data packet. This effect on the ACW-PM linearity is negligible for VDD variations of less than 100 mV, although it may change the ACW-PM curve slightly. In Fig. 4.17b, the effect of tuning the overdrive voltage on the output signal is shown where the down-converted IQ trajectory of a triangle ACW signal (without phase modulation) is measured for different LDO settings that vary the VDD of the buffers from 1.1 V to 1.19 V. Therefore, by changing the LDO settings less than 10 LSB,


Figure 4.15: (a) Overall block diagram of the proposed DPA, (b) the circuit of sub-PA, and (c) the single-ended to differential converter.

the ACW-PM linearity remains intact and there is no need to retune the delay offsets settings. The input phase-modulated RF clock is amplified on-chip and subsequently fed to the multiphase RF clocking circuit. This block generates five separate differential RF clocks with optimized delay offsets and, at the end, simultaneously applies them to the corresponding DPA segments. In order to compensate for PVT, frequency, and load variations, and effect on the ACW-PM correction, the resolution of the phase offsets should be about  $5 - 6^{\circ}$  which translates into an approximate 6.5 ps delay for the RF range of 2-2.5 GHz. This resolution is less than half of the absolute delay of a minimum sized inverter in 40 nm CMOS technology. To overcome this limitation, each delay offsets is



Figure 4.16: Chip micrograph (core area =  $1 \text{ mm} \times 0.45 \text{ mm}$ ).



Figure 4.17: (a) 6-bit digitally programmable on-chip LDO designed for overdrive-voltage tuning, and (b) IQ trajectory of the effect of tuning the LDO setting on ACW-PM linearity.



Figure 4.18: Structure of the 4-bit fine-resolution delay line and its delay-cells.

implemented with a 4-bit digitally programmable fine resolution delay line based on the relative delay of current-starved inverters [29], as shown in Fig. 4.18. The absolute delay of each delay cell is controlled with a single bit by enabling or disabling NMOS and PMOS transistors in series with the  $V_{DD}$ /GND paths. The RF clock passes through 15 cascaded delay cells to arrive at the output, resulting in a total relative delay of 97 ps with a resolution of ~6.5 ps, which is sufficient to compensate for the practical variations.



Figure 4.19: AM/PM timing mismatch correction by (a) coarse delay line, and (b) digital FIR filter implemented as a fractional delay.

The ACW data are stored in an on-chip 4K SRAM running at 625 MHz. In order to compensate the timing mismatch between the ACW and PM paths, which significantly degrades the EVM and ACPR for wideband signals [5, 7], two different techniques, used either separately or simultaneously, are utilized. The first is a 4-bit programmable delay line comprising 15 cascaded delay cells, as shown in Fig. 4.19a, with a resolution of ~30 ps and a total range of ~450 ps, which is placed in the path of the baseband sampling clock of the ACW registers. The second is a digital 10-tap FIR filter implemented on-chip as a fractional delay element in the digital path of the ACW as shown in Fig. 4.19b. The coefficients of the filter are given by  $h[n] = \frac{\sin[\pi(n-\Delta .F_S)]}{\pi(n-\Delta .F_S)}$  [30], in which *n* is the index of the tap coefficient and  $\Delta$  is the desired delay which is not necessarily an integer multiple of  $1/F_S$ . Therefore, while the registers of the filter are clocked with a frequency of  $F_S$ , the output codes are delayed by a fraction of  $1/F_S$  which is the group delay of the digital FIR filter.

A transformer with a 1:3 turn ratio is implemented as the balun. In Fig. 4.20 the layout of the balun with and without the shields are shown. The unconventional pad configuration is due to the area limitations. The EM simulation results including the magnetic coupling factor  $K_m$ , primary and secondary inductances  $L_P$  and  $L_S$ , winding resistances  $R_P$  and  $R_S$ , quality factors  $Q_P$  and  $Q_S$ , real load impedance  $R_L P$  seen at differential input, and passive efficiency  $\eta_{passive}$ , are plotted versus frequency in Fig. 4.21. At 2 GHz, these parameters are  $K_m = 0.756$ ,  $L_P = 0.41$  nH,  $L_S = 3.1$  nH,  $R_P = 0.5 \Omega$ ,  $R_S = 5.5 \Omega$ ,  $Q_P = 10.2$ ,  $Q_S = 7.1$ ,  $R_L P = 1.73 \Omega$ , and  $\eta_{passive} = 73.3\%$ , respectively. The extracted parasitic capacitance of drain ( $C_{PAR}$ ) is 0.82 pF. The device capacitance is about 2.3~2.4 pF, the external capacitance added to the drains is 5 pF, and the coupling capacitance between the differential drains is also 5 pF. The balun with the entire load network of the DPA including all the capacitances is EM simulated. The simulation results are shown



Figure 4.20: (a) Layout of the balun, and (b) final design with ground shielding and output pads.

in Fig. 4.22. The loaded quality factor calculated by  $\frac{f(@Zin=Max)}{\Delta f_{-3dB}}$  is about 1.8. As can be seen, for f > 2.8 GHz the input loaded reactance is negative, which means that for fundamental frequencies of 1.4 GHz <  $f_C < 2.8$  GHz, the input loaded reactance at all higher harmonics is negative ( $-6j < X_{IN} < 0$ ) and comparable to the fundamental input load resistance, as required by class-E load conditions[31].

#### 4.4.2. CLASS-E DPA WITH OFF-CHIP MATCHING NETWORK

Typically, a DPA implemented with an on-chip matching net-work (MN), cannot achieve drain efficiency higher than 50 %, while higher efficiencies is feasible with an off-chip matching network. Therefore, in order to achieve a drain efficiency higher than 50 % over a wide RF bandwidth, a wideband compensated Marchand balun using re-entrant coupled lines is designed and fabricated, for which a completely different DPA on the same die (as the one with the on-chip MN) with different nonlinear sizing and multiphase delay offsets parameters is designed and fabricated. In Fig. 4.23, the chip micrograph of the DPA designed for the off-chip MN is shown.

The planar Marchand balun is an attractive transmission-line-based balun topology due to its wideband amplitude, and phase balance and relatively easy implementation [32, 33]. The conventional planar Marchand balun consists of two symmetrical  $\lambda/4$ 



Figure 4.21: Electromagnetic (EM) simulation results of the on-chip balun.



Figure 4.22: EM simulation results of the loaded input impedance of the on-chip balun.

coupled lines with open and short circuited terminations at specified ports to provide a balanced loading condition resulting from a single-ended load. However, directly employing the coupled lines in a practical Marchand balun will lead to an imperfect conversion from the unbalanced signal into the balanced signal due to unequal even- and oddmode phase velocities. To address this issue, a compensation technique [33] is adopted to reduce the imbalance of the balun. The details of this technique will be described in Chapter 5. In order to obtain a wideband Marchand balun at a low impedance level, tight coupling with high even-mode impedance is required. Re-entrant coupled lines are used to achieve tight coupling without strict requirements in circuit fabrications [32]. Employing this structure with proper dielectric constant and reasonable layer thickness between



Figure 4.23: Chip micrograph showing the DPA designed for the off-chip MN.



Figure 4.24: Conceptual structure of the off-chip MN with a compensated Marchand balun as well as the connection of the DPA to the MN with the realized parallel and series resonators.

the conductors achieves the expected tight coupling, yielding a very low-loss wideband balun. Combining the core network of the Marchand balun with a differential re-entrant type impedance inverter (total length  $\lambda/4$ ) and second harmonic impedance control, wideband class-E digital PA performance can be facilitated. To achieve this, the network impedance provided to the push-pull digital class-E PA should create an open condition for all higher harmonics and in particular for the second harmonic. Therefore, by creating a short circuit termination at the  $\lambda_{even}/8$ , an open circuit at the reference plane of the DPA can be achieved for the second harmonic. In Fig. 4.24, both the conceptual structure of the off-chip MN implemented as compensated Marchand balun, as well as the connection of the DPA to the MN and the realized parallel and series resonators are shown.

The re-entrant coupled line with different dielectric constants create different even-



Figure 4.25: (a) Compensated Marchand balun with second harmonic termination implemented by a via, and (b) the measured and simulated differential-to-single-ended transmission loss.

and odd-mode impedance between the layers as well as different effective dielectric constants ( $\epsilon_{reff}$ ). Therefore, since  $\lambda = \lambda_{air}/\sqrt{\epsilon_{reff}}$ , the  $\lambda_{even}$  and  $\lambda_{odd}$  can be made different from each other. In this design,  $\lambda_{even} = 59mm$  and  $\lambda_{odd} = 36mm$ , thus  $\lambda_{even}/8 \approx$ 7.4mm and  $\lambda_{odd}/4 = 9mm$  are close to each other. The details of this technique will be described in chapter 5. The required even-mode second harmonic short-circuit condition in the re-entrant coupled is realized by adding a simple via from the floating middlelayer conductor to ground at the position where the even-mode electrical length for the second harmonic  $2f_0$  equals  $\lambda_{even}/8$ . Due to the tight coupling be-tween the three conductors, the top metals are also automatically forced to ground for their even-mode signals, while the differential operation/terminations remain unaffected. In Fig. 4.25, the structure of the designed compensated Marchand balun as well as the measured and simulated transmission loss from the balanced input port to the unbalanced output port, are shown. The DPA die is mounted on a FR4 PCB. The Marchand balun, as shown in Fig. 4.26, is fabricated separately on a two-layer Rogers material.

# **4.5.** MEASUREMENT RESULTS

The measurement setup is shown in Fig. 4.27. An analog off-chip I/Q modulator provides the phase-modulated (PM) RF clock. Since the output pads of the balun are not located at the edge of the chip, the static continuous-wave (CW) measurements of output power ( $P_{OUT}$ ), DE, and power added efficiency (PAE) are carried out by probing to avoid loss caused by the long bond wires. The dynamic measurements are performed after wire bonding. PAE includes all on-chip power consumption including the sub-PA drivers, digital decoders, multiphase RF clocking circuit, and LDO. All of the measurements are



Figure 4.26: Fabricated PCB of the off-chip Marchand balun.



Figure 4.27: Measurement setup.

conducted without using any type of DPD.

# **4.5.1.** STATIC (CW) POWER/EFFICIENCY MEASUREMENTS

The peak output power and efficiency of the DPA with the on-chip MN at different carrier frequencies are measured using CW signals over the range of 1.5 GHz to 3 GHz for different output stage  $V_{DD}$ s as plotted in Fig. 4.28a. By increasing the  $V_{DD}$  from 0.5 V to 0.7 V, the peak output power increases accordingly from 14.3 dBm to 17.3 dBm (3 dB)



Figure 4.28: (a) Measured peak DE (%), PAE (%), and  $P_{OUT}$  (dBm) of the DPA with the on-chip MN vs. carrier frequency for  $V_{DD}$  = 0.5, 0.6, and 0.7 V; (b) measured  $P_{OUT}$  and  $P_{DC}$  normalized to  $P_{DC,Max}$ , DE and PAE vs. normalized output amplitude at 2.2 GHz with  $V_{DD}$  = 0.5 V, showing a linear roll-off for DE similar to class-B.

at  $f_C = 2 \text{ GHz}$  and from 14.6 dBm to 17.6 dBm at  $f_C = 2.2 \text{ GHz}$ . The 1 dB bandwidth of the peak P<sub>OUT</sub> ranges from 1.5 GHz to 2.7 GHz, showing a fractional bandwidth of over 57 %. The peak PAE also increases from 24 % to 29 % at  $f_C$  = 2 GHz and from 26 % to 32 % at  $f_C = 2.2$  GHz. The peak drain efficiency is not dependent on  $V_{DD}$ , and it reaches approximately 37% at  $f_C$  = 2 GHz and 44% at  $f_C$  = 2.2 GHz. The measured P<sub>OUT</sub> and P<sub>DC</sub> at  $f_C = 2.2$  GHz with  $V_{DD} = 0.5$  V are normalized to maximum measured DC power and plotted in Fig. 4.28b versus the normalized output amplitude, DE and PAE. As can be seen, P<sub>DC</sub> and DE have almost linear roll-off versus the output amplitude similar to class-B. As expected, there is a nonlinear roll-off for PAE since this parameter includes the power consumption of all of the other circuit blocks that do not scale with output power. The measured and simulated power breakdown of the DPA at full power (with ACW = 511) is shown in Table 4.1. Similarly, the peak POUT, DE and PAE of the DPA with the off-chip MN are measured and plotted versus center frequency for  $V_{DD} = 0.7$  V in Fig.4.29a, which illustrates a very wideband performance with more than 50% DE over 2.2-3 GHz at 16-17 dBm of output power. The peak Pout, DE and PAE are 17.2 dBm, 67 % and 45 % at 2.6 GHz, respectively.

#### **4.5.2.** STATIC LINEARITY MEASUREMENT BY TRIANGLE SIGNAL

Since the input signal in a DPA is digital, it is straightforward to generate a perfect (quantized) ramp or triangle as an input signal and directly measure the (semi) static linear-



Figure 4.29: (a) Measured peak DE (%), PAE (%), and  $P_{OUT}$  (dBm) vs. carrier frequency, and (b) ACW-AM and ACW-PM of the DPA with the off-chip MN with  $V_{DD}$  = 0.7 V.



Figure 4.30: Measured semi-static linearity of the DPA with the on-chip MN using a triangle signal at 2 GHz and  $V_{DD} = 0.5V$  (a) ACW-AM for various LDO settings, and (b) ACW-PM after each iteration of the optimization algorithm. The numbers in the brackets show the codes of the five delay offsets.

ity. Consequently, in this measurement, a 4096-sample triangle signal is generated and programmed into the on-chip SRAM, resulting in a 152.6 KHz AM signal without phase modulation. At the output, 128 periods of the signal are measured and averaged. The ACW-AM curves for different LDO settings are depicted in Fig. 4.30a, which shows the effectiveness of the overdrive voltage tuning in controlling the concavity of the ACW-



Figure 4.31: Measured ACW-AM and ACW-PM of the DPA with the on-chip MN under load variations: (a) before correction, and (b) after correcting the LDO and delay offset settings.

AM curve. The ACW-PM curves without and with multiphase clocking after each iteration of the optimization algorithm, as well as the corresponding delay codes [Delay<sub>Seg1</sub> Delay<sub>Seg2</sub> ... Delay<sub>Seg5</sub>], are depicted in Fig. 4.30b. After 4 iterations, the phase error from ACW = 64 to ACW = 511 is less than  $\pm 1^{\circ}$ . Furthermore, the effect of the load variations from  $25\Omega$  to  $100\Omega$  is measured and shown in Fig. 4.31a. Although the linearity degrades slightly, by retuning the LDO and delay offsets, as shown in Fig. 4.31b, the DPA is once again optimally linearized. The ACW-AM and ACW-PM curves of the DPA with the offchip MN are measured and plotted in Fig.4.29b, showing significant improvement in the linearity without using any sort of DPD.

# 4.5.3. MODULATED SIGNAL MEASUREMENT

The dynamic performance of the DPA with the on-chip MN is measured with QAM and OFDM modulated signals without using any type of DPD. Fig 4.32 shows the spectrum and constellation diagram of a 20 MHz 64-QAM signal with PAPR = 6.5 dB, and an OFDM 64-QAM signal with PAPR = 8.1 dB, measured at  $f_C$  = 2 GHz with  $V_{DD}$  = 0.5 V.

The ACPR1 / ACPR2 are as low as -40 / -50 dBc and -46 / -50 dBc, respectively. The EVM measurements results are -35 dB and -36 dB. The measured DE results are 18 % and 15.2 %, respectively, and PAE results are 12.6 % and 10.7 %, respectively. The DPA is also measured with 40 MHz and 80 MHZ OFDM 64-QAM signals. The measured spectra and constellation diagrams are shown in Fig. 4.33. The ACPR1 / ACPR2 are -40 / -50 dBc and -34 / -42 dBc, respectively. The EVM measurements are -33 dB and -26 dB. Similarly, the



Figure 4.32: Measured spectrum and constellation diagram of the DPA with the on-chip MN: (a) 20 MHz 64-QAM signal, and (b) 20 MHz OFDM 64-QAM signal.



Figure 4.33: Measured spectrum and constellation diagram of the DPA with the on-chip MN with a (a) 40 MHz OFDM 64-QAM signal, and (b) 80 MHz OFDM 64-QAM signal.

dynamic performance of the DPA with the off-chip MN for a 40 MHz QAM signal is measured at  $f_C = 2.6$  GHz with  $V_{DD} = 0.7$  V and depicted in Fig. 4.34, showing a -40 dBc ACPR and -30 dB EVM without applying any sort of DPD.

Furthermore, without retuning the LDO and delay offsets settings, the 20MHz OFDM signal is measured under different DPA loads and DC supply voltages, as shown in Fig. 4.35. As can be seen, the output spectrum easily passes the 802.11/ac/g masks with differ-



Figure 4.34: Measured spectrum and constellation diagram of the DPA with the off-chip MN with a 40 MHz 64-QAM signal centered at  $f_C$  = 2.6 GHz with  $V_{DD}$  = 0.7 V.



Figure 4.35: Measured spectrum of the DPA with the on-chip MN with 20 MHz OFDM signals under different (a) load conditions and (b)  $V_{DD}$ s.

ent  $V_{DD} = 0.5$ , 0.6, and 0.7 V (resulting in P<sub>OUT</sub> = 6.2, 7.8, and 9.2 dBm, respectively). Although linearity is degraded by load variations, the output spectrum can still easily pass the 802.11/ac/g masks. In addition, the output spectra of 40 MHz and 80 MHz OFDM signals are also measured with different  $V_{DD}$ s, as shown in Fig. 4.36. Measurement results show that, for very wideband signals (BW > 40 MHz), the output spectrum can pass the



Figure 4.36: Measured spectrum of the DPA with the on-chip MN with (a) 40 MHz and (b) 80 MHz OFDM signals under different  $V_{DD}$ s.



Figure 4.37: Measured out-of-band spectrum of the DPA with the on-chip MN with a 20 MHz OFDM signal measured at  $f_C$  = 2 GHz.

802.11/ac/g masks by only slightly retuning the LDO settings (less than  $5 \text{ LSB} \approx 60 \text{mV}$ ). The out-of-band spectrum of a 20 MHz OFDM signal centered at 2 GHz is measured and shown in Fig. 4.37. The spectral sampling replicas are located at  $\pm 625 \text{ MHz}$  offset frequency and attenuated by the zero-order-hold (ZOH) transfer function to less than

| LDO<br>Multiphase                                    | 20MHz OFDM 64-QAM<br>based on 802.11g | CW Full Power,<br>(at ACW=511) |            |  |
|------------------------------------------------------|---------------------------------------|--------------------------------|------------|--|
| Buffers<br>+Digital                                  | Measurement                           | Measurement                    | Simulation |  |
| Output Power (dBm)                                   | 6.2                                   | 14.3                           | 15.6       |  |
| Drain DC Power (mW)                                  | 27.4                                  | 71                             | 85         |  |
| Buffers & Digital DC Power (mW)                      | 4.8                                   | 23                             | 21.4       |  |
| Multiphase Clocking DC Power (mW)                    | 6.6                                   | 6.6                            | 6.8        |  |
| LDO DC Power (mW)                                    | 0.2                                   | 1.1 1.9                        |            |  |
| SRAM DC Power (mW)<br>( <u>Not included in PAE</u> ) | 32                                    | 26.4                           | NA         |  |

Table 4.1: Measured and simulated power breakdown of the DPA with the on-chip MN at  $f_C = 2$  GHz.

-40 dBc.

The measured power breakdown of the DPA with the on-chip MN at 8.1dB power back-off (PBO) with a 20 MHz OFDM signal is depicted in Table 4.1. As can be seen, the total power consumption of the LDO and multiphase RF clocking is approximately 6.8 mW, while the measured power consumption of the SRAM is approximately 32 mW. Thus, compared to an LUT-DPD based linearization technique, the advanced techniques proposed in this work conserve more than 25 mW of power, which significantly improves the efficiency of the DPA. Table 4.2 summarizes and compares this work with the prior art. Compared to the prior art, which has targeted high efficiency by using nonlinear DPA, this works achieves comparable or higher efficiency without compromising the linearity. On the other hand, compared to the prior art, which uses linear DPA structures, this work achieves higher linearity, higher signal bandwidth, and higher efficiency or output power.

# 4.6. CONCLUSION

While the linearity and power/efficiency of a TX stage are normally traded-off against each other, this work provides advanced circuit-level linearization techniques suitable for switch-mode DPAs without the need for DPD or any compromise on the output power or efficiency. Three novel circuit techniques are introduced, namely; nonlinear sizing, overdrive-voltage tuning, and multiphase RF clocking. The combination of these

| Technology (nm) | DPD              | Peak PAE (%)            | Peak DE (%)     | Peak Pour (dBm) | Fc (GHz)        | DPA V <sub>DD</sub> (V) | EVM (dB)                       | ACPR1 (dBc)             | Bandwidth (MHz) | Matching Network | Architecture           |           |  |                         |              |    |  |  |  |
|-----------------|------------------|-------------------------|-----------------|-----------------|-----------------|-------------------------|--------------------------------|-------------------------|-----------------|------------------|------------------------|-----------|--|-------------------------|--------------|----|--|--|--|
| 40              | ON               |                         | 24 <sup>3</sup> | 24 <sup>3</sup> | 24 <sup>3</sup> | 24 <sup>3</sup>         | 24 <sup>3</sup>                | 24 <sup>3</sup>         | 24 <sup>3</sup> | 37               | 14.3                   | 2.0       |  | -36<br>(OFDM<br>64-QAM) | -46<br>@ 2.0 | 20 |  |  |  |
|                 |                  | 43.8<br>26 <sup>3</sup> |                 |                 |                 | 0.5                     | -33<br>(OFDM<br>64-QAM)        | -40<br>5GHz             | 40              | On-chip          | Polar<br>DPA           | This Work |  |                         |              |    |  |  |  |
|                 |                  |                         | 43.8            | 14.6            | 2.2             |                         | <b>-26</b><br>(OFDM<br>64-QAM) | <b>-34</b><br>@ 2.04GHz | 80              |                  |                        |           |  |                         |              |    |  |  |  |
| 40              | NO               | <b>4</b> 5 <sup>3</sup> | 67              | 17.2            | 2.6             | 0.7                     | <b>-30</b><br>(64-QAM)         | -40                     | 40              | Off-chip         |                        |           |  |                         |              |    |  |  |  |
| 65              | YES              | 35                      | 40.7            | 28.1            | 2.6             | ω                       | -36.3<br>(256-QAM)             | -27.9                   | 8               | On-chip          | Polar<br>DPA           | [8]       |  |                         |              |    |  |  |  |
| 45 SOI          | YES              | 30.4                    | NA              | 25.3            | 3.5             | 1.2/2.4                 | -40<br>(256-QAM)               | -45                     | 10              | On-Chip          | Polar<br>VMD<br>SC-DPA | [9]       |  |                         |              |    |  |  |  |
| 28              | YES              | NA                      | 39              | 28.2            | 2.45            | 2.2                     | -25<br>(64-QAM)                | -35 <sup>1</sup>        | 20              | On-Chip          | Polar<br>DPA           | [10]      |  |                         |              |    |  |  |  |
| 28              | YES <sup>2</sup> | NA                      | 42.7            | 24.5            | 2.5             | 1.4                     | -31.9<br>(64-QAM)              | -31.5                   | 20              | On-Chip          | Polar<br>DPA           | [11]      |  |                         |              |    |  |  |  |
| 65              | NO               | NA                      | 35              | 24              | 2.1             | 2.1                     | -28<br>(64-QAM)                | -28 <sup>1</sup>        | 20              | On-chip          | Polar<br>DPA           | [12]      |  |                         |              |    |  |  |  |
| 28              | NO               | 15.3 <sup>1</sup>       | $27.4^{1}$      | 81              | -               | 0.9 / 1.8               | NA                             | -42                     | 20              | Off-chip         | IQ<br>QDAC             | [13]      |  |                         |              |    |  |  |  |
| 40              | NO               | 45                      | NA              | 8               | 0.93            | -                       | -27.1<br>(64-QAM)              | -42 <sup>1</sup>        | 2               | Off-chip         | Polar<br>SC-DPA        | [14]      |  |                         |              |    |  |  |  |

Table 4.2: Performance summary and comparison with the prior art.

<sup>1</sup>Calculated or estimated from the paper figures or tables.

<sup>2</sup> LUT-DPD for AM-AM, no DPD for AM-PM

<sup>3</sup>PAE includes the power consumption of all drivers, digital decoders, the multiphase RF clock generator and LDO, excluding the SRAM.

inventive techniques lead to very high TX linearity for wideband signals, with both onchip and off-chip load/matching networks. They also allow for digitally controlled fine tuning to manage the variations in PVT, operating frequency, and output load. The nonlinearity behavior of a class-E DPA is thoroughly analyzed and closed-form equations are given to predict the amplitude-code-word (ACW)-AM ACW-PM curves of the DPA. Compared to the prior art, which uses "linear" DPD-less DPA structures, this work achieves higher linearity, higher signal bandwidth, and higher efficiency or higher output power. Compared to DPAs in general (including those using DPD), this work can still provide better linearity with comparable efficiency. Furthermore, compared to the version with the on-chip matching network, by using a novel compensated Marchand balun with reentrant coupled lines, a very high and wideband efficiency performance is achieved.

## REFERENCES

- M. Hashemi, Y. Shen, M. Mehrpoo, M. S. Alavi, and L. C. N. de Vreede, *An Intrin*sically Linear Wideband Polar Digital Power Amplifier, IEEE Journal of Solid-State Circuits 52, 3312 (2017).
- [2] M. Hashemi, L. Zhou, Y. Shen, M. Mehrpoo, and L. de Vreede, *Highly efficient* and linear class-E CMOS digital power amplifier using a compensated Marchand balun and circuit-level linearization achieving 67% peak DE and 40 dBc ACLR without DPD, in 2017 IEEE MTT-S International Microwave Symposium (IMS) (2017) pp. 2025–2028.
- [3] R. B. Staszewski, J. L. Wallberg, S. Rezeq, C.-M. Hung, O. E. Eliezer, S. K. Vemulapalli, C. Fernando, K. Maggio, R. Staszewski, N. Barton, M.-C. Lee, P. Cruise, M. Entezari, K. Muhammad, and D. Leipold, *All-Digital PLL and Transmitter for Mobile Phones,* IEEE J. of Solid-State Circuits 40, 2469 (2005).
- [4] A. Kavousian, D. K. Su, M. Hekmat, A. Shirvani, and B. A. Wooley, A Digitally Modulated Polar CMOS Power Amplifier With a 20-MHz Channel Bandwidth, IEEE J. of Solid-State Circuits 43, 2251 (2008).
- [5] C. D. Presti, F. Carrara, A. Scuderi, P. M. Asbeck, and G. Palmisano, A 25 dBm Digitally Modulated CMOS Power Amplifier for WCDMA/EDGE/OFDM With Adaptive Digital Predistortion and Efficient Power Control, IEEE J. of Solid-State Circuits 44, 1883 (2009).

- [6] D. Chowdhury, L. Ye, E. Alon, and A. Niknejad, An Efficient Mixed-Signal 2.4-GHz Polar Power Amplifier in 65-nm CMOS Technology, IEEE J. of Solid-State Circuits 46, 1796 (2011).
- [7] L. Ye, J. Chen, L. Kong, E. Alon, and A. Niknejad, *Design Considerations for a Direct Digitally Modulated WLAN Transmitter With Integrated Phase Path and Dynamic Impedance Modulation*, IEEE J. of Solid-State Circuits 48, 3160 (2013).
- [8] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, and J. R. Long, A Wideband 2 × 13-bit All-Digital I/Q RF-DAC, IEEE Trans. on Microw. Theory Techni. 62, 732 (2014).
- [9] J. S. Park, S. Hu, Y. Wang, and H. Wang, A Highly Linear Dual-Band Mixed-Mode Polar Power Amplifier in CMOS with An Ultra-Compact Output Network, IEEE J. of Solid-State Circuits 51, 1756 (2016).
- [10] V. Vorapipat, C. Levy, and P. Asbeck, A Class-G Voltage-Mode Doherty Power Amplifier, in IEEE ISSCC Dig. Tech. Papers (ISSCC) (2017) pp. 46–47.
- [11] D. Cousinard, R. Winoto, H. Li, Y. Fang, A. Ghaffari, A. Olyaei, O. Carnu, P. Godoy, A. Wong, X. Zhao, J. Liu, A. Mitra, R. Tsang, and L. Lin, A 0.23mm<sup>2</sup> Digital Power Amplifier with Hybrid Time/Amplitude Control Achieving 22.5dBm at 28% PAE for 802.11g, in IEEE ISSCC Dig. Tech. Papers (ISSCC) (2017) pp. 228–229.
- [12] J. Park, Y. Wang, S. Pellerano, C. Hull, and H. Wang, A 24dBm 2-to-4.3GHz Wideband Digital Power Amplifier with Built-In AM-PM Distortion Self-Compensation, in IEEE ISSCC Dig. Tech. Papers (ISSCC) (2017) pp. 230–231.
- [13] S. Zheng and H. C. Luong, A WCDMA/WLAN Digital Polar Transmitter With Low-Noise ADPLL, Wideband PM/AM Modulator, and Linearized PA, IEEE J. of Solid-State Circuits 50, 1645 (2015).
- [14] P. E. P. Filho, M. Ingels, P. Wambacq, and J. Craninckx, *An Incremental-Charge-Based Digital Transmitter With Built-in Filtering*, IEEE J. of Solid-State Circuits 50, 3065 (2015).
- [15] A. Ba, Y. H. Liu, J. van den Heuvel, P. Mateman, B. Büsze, J. Dijkhuis, C. Bachmann,
   G. Dolmans, K. Philips, and H. D. Groot, A 1.3 nJ/b IEEE 802.11ah Fully-Digital Polar Transmitter for IoT Applications, IEEE J. of Solid-State Circuits 51, 3103 (2016).

- [16] M. Babaie, F. W. Kuo, H. N. R. Chen, L. C. Cho, C. P. Jou, F. L. Hsueh, M. Shahmohammadi, and R. B. Staszewski, A Fully Integrated Bluetooth Low-Energy Transmitter in 28 nm CMOS With 36% System Efficiency at 3 dBm, IEEE J. of Solid-State Circuits 51, 1547 (2016).
- [17] G. D. Ewing, *High-Efficiency Radio-Frequency Power Amplifiers*, Ph.D. thesis, Dept. Elect. Eng., Oregon State University (1964).
- [18] N. O. Sokal and A. D. Sokal, Class E-A New Class of High-Efficiency Tuned Single-Ended Switching Power Amplifiers, IEEE J. of Solid-State Circuits 10, 168 (1975).
- [19] S. A. El-Hamamsy, Design of High-Efficiency RF Class-D Power Amplifier, IEEE Trans. Power Electronics 9, 297 (1994).
- [20] H. Kobayashi, J. Hinrichs, and P. M. Asbeck, *Current Mode Class-D Power Amplifiers* for High Efficiency RF Applications, in 2001 IEEE MTT-S Int. Microw. Symp. Digest (Cat. No.01CH37157), Vol. 2 (2001) pp. 939–942 vol.2.
- [21] S. Cripps, *RF Power Amplifiers for Wireless Communications*, Artech House microwave library (Artech House, 2006).
- [22] C. Lu, H. Wang, C. Peng, A. Goel, S. Son, P. Liang, A. Niknejad, H. Hwang, and G. Chien, A 24.7dBm All-Digital RF Transmitter for Multimode Broadband Applications in 40nm CMOS, in 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers (2013) pp. 332–333.
- [23] Z. Deng, E. Lu, E. Rostami, D. Sieh, D. Papadopoulos, B. Huang, R. Chen, H. Wang, W. Hsu, C. Wu, and O. Shanaa, 9.5 A Dual-Band Digital-WiFi 802.11a/b/g/n Transmitter SoC with Digital I/Q Combining and Diamond Profile Mapping for Compact Die Area and Improved Efficiency in 40nm CMOS, in 2016 IEEE International Solid-State Circuits Conference (ISSCC) (2016) pp. 172–173.
- [24] S. D. Kee, I. Aoki, A. Hajimiri, and D. Rutledge, *The Class-E/F Family of ZVS Switch-ing Amplifiers*, IEEE Trans. on Microw. Theory Techn. **51**, 1677 (2003).
- [25] M. Acar, A. J. Annema, and B. Nauta, *Analytical Design Equations for Class-E Power Amplifiers*, IEEE Trans. on Circuits and Systems I: Regular Papers 54, 2706 (2007).
- [26] P. T. M. van Zeijl and M. Collados, A Digital Envelope Modulator for a WLAN OFDM Polar Transmitter in 90 nm CMOS, IEEE J. of Solid-State Circuits 42, 2204 (2007).

- [27] B. Razavi, Design of Analog CMOS Integrated Circuits (2016).
- [28] C.-H. Lin and K. Bult, A 10-b, 500-MSample/s CMOS DAC in 0.6 mm2, IEEE Journal of Solid-State Circuits 33, 1948 (1998).
- [29] M. Maymandi-Nejad and M. Sachdev, A Monotonic Digitally Controlled Delay Element, IEEE J. of Solid-State Circuits 40, 2212 (2005).
- [30] A. Oppenheim and R. Schafer, *Discrete-time Signal Processing*, Prentice-Hall signal processing series (Prentice Hall, 1989).
- [31] F. H. Raab, *Class-E, Class-C, and Class-F Power Amplifiers based upon a Finite Number of Harmonics*, IEEE Trans. on Microw. Theory Techn. **49**, 1462 (2001).
- [32] R. K. Mongia, B. I. J., and P. Bhartia, *RF and Microwave Coupled-Line Circuits* (Artech House, 2007).
- [33] P. Tsai, Y. Lin, J. Kuo, Z. Tsai, and H. Wang, Broadband Balanced Frequency Doublers With Fundamental Rejection Enhancement Using a Novel Compensated Marchand Balun, IEEE Transactions on Microwave Theory and Techniques 61, 1913 (2013).

# 5

# A HIGHLY-LINEAR WIDEBAND POLAR CLASS-E CMOS DIGITAL DOHERTY PA

This chapter follows to a great extent the paper [1], published in TMTT 2019. This might yield a slight overlap in light of the previous discussions in Chapter 4, but it was not changed dramatically in order to preserve the coherence.



Figure 5.1: Digital-intensive Polar TX with digital Doherty PA.

# **5.1.** INTRODUCTION

I N wireless systems, a high data rate is normally achieved by using wideband signals with high QAM/OFDM modulation orders, resulting in a high peak-to-averagepower-ratio (PAPR) [2]. This forces the PA to operate in deep power backoff (PBO), thus reducing its power efficiency if no efficiency enhancement technique is applied. Among the various efficiency enhancement techniques such as envelope tracking [3, 4] and Doherty technique [2, 5–15], the Doherty technique is still one of the most widely used efficiency enhancement techniques because of its relatively simple and low-cost implementation, which is easily applicable to a digital polar TX architecture, as depicted in Fig. 5.1. Using an off-chip matching network reduces the passives losses, thus increasing the efficiency especially at PBO levels compared to an on-chip matching network implementation [11–13].

Conventional TX design approaches are often based on using a nonlinear PA to achieve high efficiency and then linearize it by applying digital predistortion (DPD) techniques [2, 5–8, 10, 12, 13, 15].

Furthermore, as will be discussed in Section 5.4, even with ideal DPD, due to the highly nonlinear operation mode of class-E digital PAs, it is not possible to achieve maximum spectral purity and minimum error-vector-magnitude (EVM) for a given number of bits with a conventional uniform digital PA structure [16]. The bandwidth of a digital AM signal is mostly limited by the sampling rate and not by analog blocks, as occurs in an analog-intensive polar TX [17, 18], thus, in principle it can handle a higher signal

bandwidth [13, 14, 19–25].

In addition to high video bandwidth, high RF bandwidth is also of great importance. There are three main challenges in increasing the RF bandwidth of a class-E Doherty PA, namely: (A) the class-E PA bandwidth limitations, (B) the impedance converter limited bandwidth, and (C) the balun limited bandwidth, which can be mitigated using three different techniques, namely: reactance compensation [26], a shunt open-circuit  $\lambda/8$  section [10, 15, 27] parallel to the load, and compensated Marchand balun with re-entrant coupled lines [24, 28–30], respectively.

In this work, for the first time, a linear digital-intensive Polar class-E Doherty PA is demonstrated in which the linearity is significantly enhanced using circuit-level linearization techniques with automatic duty cycle correction. Wideband efficiency enhancement is achieved by using a reactance-compensated parallel-circuit class-E PA along with a wideband impedance inverter and a novel wideband Marchand balunbased Doherty power combiner, implemented using re-entrant coupled lines with independent second harmonic control. Nonlinear sizing, multiphase RF clocking and overdrive-voltage control have been successfully used recently to linearize single PAs with both on-chip [23, 25] and off-chip [24] matching networks at the circuit level without using DPD.

In the following, a wideband class-E Doherty PA and a digital Doherty PA are discussed in Sections 5.2 and 5.3, respectively. System-level design considerations are discussed in Section 5.4, and the circuit-level linearization techniques are described in Section 5.5. The final design and implementation are explained in Section 5.6, followed by the measurement results and conclusion in Sections 5.7 and 5.8, respectively.

# **5.2.** WIDEBAND CLASS-E DOHERTY PA

A symmetric Doherty PA, as shown in Fig. 5.2a, consists of a *main* (or carrier) and *peak* (or auxiliary) power amplifiers, where the peak PA is only active beyond the 6 dB PBO point resulting in an additional peak in the efficiency, as shown in Fig. 5.2b. The output powers are combined using an impedance inverter. To maintain linearity, efficiency is typically compromised at the high efficiency power backoff point to ease DPD [2, 5–8, 10–12, 15]. To achieve higher efficiency, switch-mode PAs can also be used as branch amplifiers. Among the switch-mode PAs, the class-E has one of the simplest load networks, and can it theoretically provide up to 100% drain efficiency, while absorbing the drain capacitance in its load network [2, 31–34]. In this section, different techniques to



Figure 5.2: (a) The simplified single-ended structure of a class-E Doherty PA with TL-based impedance inverter, highlighting three different bandwidth limiting factors:(A) the class-E load network, (B) the impedance inverter, and (C) the matching network/balun, (b) normalized drain efficiency versus output power backoff.



Figure 5.3: (a) Single push-pull class-E PA, and (b) angle of the impedance seen by drain.

mitigate the bandwidth liming factors, as highlighted in Fig. 5.2a, are described.

#### 5.2.1. REACTANCE COMPENSATED PARALLEL-CIRCUIT CLASS-E PA

The general topology of a push-pull class-E PA with finite dc feed inductance is shown in Fig. 5.3a. The resonance factor is defined as  $q_D = 1/(\omega_0 \sqrt{L_D C_D})$ . As mentioned in Chapter 2, it has been shown that for  $q_D = 1.412$ , the output power for a given V<sub>DD</sub> and  $R_L$  reaches the maximum, and the series reactance X can be zero [26, 35, 36]. Such a structure, known as parallel-circuit class-E, has a higher maximum operating frequency and higher load resistance [36]. To achieve a wideband RF operation, the load angle seen



Figure 5.4: (a) Conventional and compensated impedance inverter, (b) Smith chart showing the input impedances at 6 dB PBO, (c) normalized magnitude and angle of the input impedance at 6 dB PBO vs. normalized frequency, and (d) Doherty PA with a compensated impedance inverter and the ideal class-B drain efficiency curves at 6 dB PBO vs. frequency.

by the intrinsic drain should remain constant over the required bandwidth. This can be done through reactance compensation [26, 35]. By properly choosing the parameters of the series resonator, a constant load angle, as shown in Fig. 5.3b, over a wide frequency band can be achieved, resulting in the optimum  $Q_{series} = 1.026$  [35]. However, in an ideal class-E PA, a high  $Q_{series}$  is required to block all the harmonics in the series resonator, because otherwise the efficiency drops. By using a push-pull configuration with a differential matching network, the orthogonality between odd and even terminations can be used to ensure a very high even mode (second harmonic) impedance, and as such relax the  $Q_{series}$  requirement of the series resonator, achieving wideband operation without compromising the efficiency of the class-E PA.

#### **5.2.2.** COMPENSATED IMPEDANCE INVERTER

Doherty implementations normally use a quarter-wave transmission line (QWTL) or its lumped equivalent as the impedance inverter (Fig. 5.4a) [2, 5–8, 11–14]. As can be seen from Fig. 5.4b and 5.4c, the magnitude and phase of the impedance  $Z_m$  seen by the main PA is highly sensitive to frequency. By adding an open-circuit compensation half-wave transmission line (HWTL) in parallel to the load [10, 15, 27], as shown in Fig. 5.4a, the input impedance is given by:

$$Z_m(f) = \frac{R_L Z_0 \left(1 - \tan(\pi \frac{f}{f_0}) \tan(\frac{\pi}{2} \frac{f}{f_0})\right) + j Z_0^2 \tan(\frac{\pi}{2} \frac{f}{f_0})}{Z_0 + j R_L \left(\tan(\pi \frac{f}{f_0}) + \tan(\frac{\pi}{2} \frac{f}{f_0})\right)}$$
(5.1)

which shows smaller variations in the magnitude and phase of  $Z_m$  over a larger bandwidth. This structure can be employed in a Doherty configuration as depicted in Fig. 5.4d to expand the efficiency bandwidth.

# **5.2.3.** Compensated Marchand Balun with Second Harmonic Control

#### COMPENSATED MARCHAND BALUN

The planar Marchand balun, shown in Fig. 5.5a, is one of the best TL-based topologies, offering wideband amplitude and phase balance with a relatively simple implementation [30, 37]. A conventional Marchand balun is constructed from two  $\lambda/4$  coupled lines with short and open terminations at their specific ports, ideally providing a balanced loading condition from a single-ended load. However, in practice, due to the unequal even-mode and odd-mode phase velocities, the conversion from single-ended to bal-



Figure 5.5: (a) Marchand balun, (b) compensated Marchand balun, and (c) compensated Marchand balun with re-entrant coupled lines.

anced operation is not perfect, yielding some imbalance. To correct for this imbalance, a compensation technique can be adopted, where an extra compensation line section is added between the two  $\lambda/4$  coupled line sections (Fig. 5.5b) the parameters of which can be calculated as follows [37]:

$$Z_{cm}\cot(\frac{\theta_{cm}}{2}) = \frac{Z_{0o}\csc\theta_e - Z_{0e}\csc\theta_o}{\csc\theta_e\cot\theta_o - \csc\theta_o\cot\theta_e}$$
(5.2)

where  $Z_o$  and  $Z_e$  are the characteristic impedance, and  $\theta_o$  and  $\theta_e$  are the electrical length of the odd and even modes, respectively.  $Z_{cm}$  and  $\theta_{cm}$  are the characteristic impedance and the electrical length of the compensation section.

#### **RE-ENTRANT COUPLED LINES**

The bandwidth of a Marchand balun at low (odd-mode) impedance levels depends on the  $Z_{0e}/Z_{0o}$  ratio. This can be very challenging in practice with single-layer transmission lines, as a very small horizontal gap between the coupled lines is required. However, reentrant coupled lines, as shown in Fig. 5.5c, can achieve very tight coupling without strict fabrication requirements [30]. In the odd-mode,  $Z_{0o} = Z_{0,1}$ , where  $Z_{0,1}$  is the impedance between the transmission lines and floating layer. In the even-mode,  $Z_{0e} = Z_{0,1} + 2Z_{0,2}$ , where  $Z_{0,2}$  is the impedance between the floating layer and the bottom plate. In this case, the coupling factor  $K = (Z_{0e} - Z_{0o})/(Z_{0e} + Z_{0o}) = 1/(1 + Z_{0,1}/Z_{0,2})$  mostly depends on the  $Z_{0,1}/Z_{0,2}$  ratio rather than the horizontal spacing between the coupled lines, thus



Figure 5.6: (a) Ac grounding at  $\lambda_{even}/8$  for second harmonic control, using a via from the floating metal layer to the ground plane, shown for a single PA, (b) EM simulation results of the input impedance at 2<sup>nd</sup> harmonic, shown for a single PA.

relaxing the dimensional fabrication requirements. In general, a low  $Z_{0,1}/Z_{0,2}$  ratio is preferred. Therefore, with an upper layer with a larger dielectric constant, but a smaller thickness compared to the lower layer ( $\epsilon_{r1} > \epsilon_{r2}$  and  $H_1 < H_2$ ), a strong coupling coefficient can be expected, resulting in a low-loss, wideband balun. Furthermore, since the effective dielectric constants of the even and odd modes are different, the wavelength  $\lambda = \lambda_{air}/\sqrt{\epsilon_{reff}}$  of these modes, namely  $\lambda_{even}$  and  $\lambda_{odd}$ , are also different.

#### SECOND HARMONIC CONTROL

In a differential PA, the even harmonics appear open-circuit at the input of the balun. However, the use of a QWTL impedance transformer at the input can provide very low impedance levels for the even harmonics at the PA reference plane, which conflicts with the loading conditions of class-E PA operation. To address this issue, the center of the floating metal layer in the re-entrant  $\lambda/4$  sections is connected to the ground by a via at a distance of  $\lambda_{even}/8$  distance from the PA, as depicted in Fig. 5.6a. Therefore, thanks to the tight coupling between the top and floating metal layers, the TL is AC-ground in even-mode, thus seen as open-circuit by the PA at the second harmonic, as shown by the EM simulation in Fig. 5.6b. In the odd-mode, the center of the floating metal is virtually at ground, thus barely affecting the odd-mode impedance levels.

# **5.3.** DIGITALLY CONTROLLED CLASS-E DOHERTY PA

In an RFDAC-based class-E digital Doherty PA, the output amplitude is directly modulated by changing the effective width or  $R_{ON}$  of the final PA stage, as shown in Fig. 5.7a, for a 10-bit Doherty DPA with two 9-bit DPAs. The input amplitude-control-word (ACW)



Figure 5.7: (a) Simplified linearly sized single-ended class-E digital Doherty PA with compensated impedance inverter, and (b) ACW and effective width of each branch vs. the input ACW.

varies between  $0 - ACW_{Max} = 1022$ , and the ACW of the main  $(ACW_M)$  and peak DPA  $(ACW_P)$  both have a range of  $0 - ACW_{MP,Max} = 511$ . For  $ACW \le ACW_{MP,Max}$ , we have  $ACW_M = ACW$  and  $ACW_P = 0$ . For  $ACW > ACW_{MP,Max}$ , as shown in Fig. 5.7b, with the main DPA fully on  $(ACW_M = ACW_{MP,Max})$ , the peak DPA starts turning on  $(ACW_P = ACW - ACW_{MP,Max})$ . Note that this is different than in a conventional Doherty PA operated in transcendence mode where the input AM signal to the main PA continues to increase.

The total drain capacitance ( $C_D$ , including the transistors and interconnect parasitics) is tuned for class-E operation at  $ACW_M$  ( $ACW_P$ ) =  $ACW_{MP,Max}$ . Therefore, when the main (peak) DPA is fully on, it operates in class-E mode. As the number of switching transistors in the main (peak) DPA decreases (at PBO), the fundamental impedances become complex with positive reactances, and the second harmonic impedances become mostly negative reactances (capacitive), resembling the operation of a class-J PA [38, 39]. For small  $ACW_M$  ( $ACW_P$ ) ( < 30), the voltage swing on the drain of the main (peak) DPA is small, therefore its operation is similar to a current source with almost linear behavior [25]. The  $C_D$  change is rather small since all the devices in the output stage are always in parallel. Thus, only their gate potential changes, which affects the  $C_D$  to a small extent (varying 110 fF in total from ACW = 1 to ACW = 511 for each DPA. This is equivalent to a less than 3% change). Consequently, the variations of the  $C_D$  for  $ACW_M$  ( $ACW_P$ ) > 30 do not change the intended class-E operation significantly. Using an analysis approach similar that of Chapter 4, the single-ended Norton-equivalent linear time-invariant (LTI) model of the DPAs in the odd-mode is shown in Fig. 5.8a-5.8d. Since the HWTL section of the compensated impedance inverter does not alter the impedances seen by the main and peak PAs at center frequency  $f_C$  (except for a phase offset), the conventional QWTL is used for simplicity in the theoretical derivations. The switching transistors are replaced by a series of parallel current sources representing the harmonics of the drain current.

For theoretical simplicity, the amplitude of the fundamental is assumed to be proportional to the total effective (switched-on) device width. The output resistance is modeled in parallel to the current sources and inversely proportional to the total effective width. Ideally, the series resonator only allows the fundamental component  $I_{H1}$  to pass through. Therefore, by neglecting the higher harmonics and using the superposition theory, the output signal equals  $V_{OUT} = V_{OUT,M} + V_{OUT,P}$  where  $V_{OUT,M}$  and  $V_{OUT,P}$  are the contributions of the main and peak DPAs to the output signal, respectively, and given as follows:

$$V_{OUT,M} = \frac{K_M I_{H1}}{K_M / R_{D0} + j(1 - q_D^2) / (q_D^2 L_D \omega_0) + j L_D \omega_0 / (4R_L^2 - 4R_L^2 / q_D^2 + j 4R_L L_D \omega_0 (1 + K_P R_L / R_{D0}))}$$
(5.3)

$$V_{OUT,P} = \frac{K_P I_{H1}}{K_P / R_{D0} + j(1 - q_D^2) / (q_D^2 L_D \omega_0) + 1/R_L + jL_D \omega_0 / (4R_L^2 - 4R_L^2 / q_D^2 + j4R_L^2 L_D \omega_0 K_M / R_{D0}))}$$
(5.4)

where  $R_L$  is the load resistance seen from matching network,  $R_{D,0}$  is the output resistance of a unit transistor with a width of  $W_0$ , and  $K_M$  and  $K_P$  are the ratio of total width of the activated sub-PA cells of the main and peak DPAs to the unit transistor, respectively.  $L_D$  is dc-feed inductance (implemented by wirebonds in this work), and  $\omega_0$  is the radian center frequency. Since the variation in the total output capacitance of the transistors is small (< 3%), we consider  $C_D$  to be constant. Therefore, the ACW-AM and ACW-PM functions can be easily calculated as  $AM_{OUT} = \sqrt{V_{OUT,Re}^2 + V_{OUT,Im}^2}$  and  $\phi_{OUT} = \arctan(V_{OUT,Im}/V_{OUT,Re})$ . In contrast to the output phase, the normalized output amplitude is not a strong function of  $q_D$ , therefore by assuming  $q_D = 1$  for theoretical simplicity, the output amplitude can be calculated as follows:

$$AM_{OUT} \approx R_{D0}I_{H1} \times \left(\frac{K_M}{K_M + \frac{R_{D0}^2}{4K_P R_L^2 + 4R_L R_{D0}}} + \frac{K_P}{K_P + \frac{R_{D0}}{R_L} + \frac{R_{D0}^2}{4K_M R_L^2}}\right)$$

$$\approx R_{D0}I_{H1} \times \frac{2K_M K_P R_L^2 + K_M R_L R_{D0}}{K_M K_P R_L^2 + K_M R_L R_{D0} + R_{D0}^2/4}$$
(5.5)

By assuming  $K_M, K_P \gg R_{D0}/R_L$ , the normalized ACW-AM can be approximated by:

$$AM_{Norm}(K_M, K_P) \approx \frac{4K_M K_P K_{NL}^2 + 2K_M K_{NL}}{4K_M K_P K_{NL}^2 + 4K_M K_{NL} + 1}$$
(5.6)

where  $K_{NL} = R_L/R_{D0}$  is defined as the nonlinearity factor. In a linearly sized array, for  $ACW \le ACW_{MP,Max}$ ,  $K_M = ACW$  and  $K_P = 0$ , otherwise  $K_M = ACW_{MP,Max}$  and  $K_P = ACW - ACW_{MP,Max}$ . The calculated ACW-AM/PM curves and the full circuit (differential class-E digital PA with a real transistor model and TL-based Merchand balun) simulation results are plotted in Fig. 5.9, showing a reasonable (ACW-PM) to good agreement (ACW-AM) between the proposed model and the real circuit simulation results. As can be seen, although a switch-mode (class-E) DPA is a nonlinear time-variant circuit, the proposed LTI model provides good insight into predicting the nonlinearity behavior of the Doherty DPA for the fundamental band.

# **5.4.** System-Level Design Consideration

As explained in the previous section, a Doherty DPA with a conventional uniform array and single phase RF clocking is highly nonlinear as characterized by its static ACW-AM and ACW-PM curves. Such nonlinearities are typically corrected using digital predistortion (DPD), which can lead to nonuniform quantization effect [16, 25]. In addition, in a polar PA, since the AM and PM signal paths differ, they will have different time delays, thus requiring delay adjustments before reaching the final stage of the DPA. Furthermore, in a Doherty configuration, the paths of main and peak PAs also differ, thus requiring timing alignment between these two branches. In the following section, these system-level design considerations are explained in more detail.

#### **5.4.1.** NONUNIFORM QUANTIZATION

While DPD is commonly used to linearize a nonlinear PA, the cascaded combination of DPD and an N-bit digital-PA with a highly nonlinear ACW-AM curve, constructs a nonuniform quantizer. Such a nonuniform quantizer cannot achieve the dynamic range



Figure 5.8: Simplified single-ended Norton-equivalent LTI model of (a) the digital Doherty PA, (b) main DPA for  $ACW > ACW_{MP,Max}$ , (c) main DPA for  $ACW > ACW_{MP,Max}$ , and (d) and the peak DPA for  $ACW > ACW_{MP,Max}$ .



Figure 5.9: Calculated and simulated ACW-AM and ACW-PM curves.



Figure 5.10: (a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal DPD, along with the inverse of the ACW-AM curve, and (b) the effect of a 10-bit nonuniform quantizer on the output spectrum, compared with an ideal 10-bit quantizer and 13-bit nonuniform quantizer.

(DR) and linearity levels expected from an ideal N-bit quantizer (i.e. the digital PA). As illustration, Fig. 5.10a plots the ACW-AM curve of a 10-bit Doherty DPA with and without ideal DPD, the inverse of the ACW-AM curve, the probability distribution function (PDF) of a typical QAM signal, and the zoomed-in view around the transition point where the peak DPA starts operating. As can be seen, the quantization levels at small ACWs for both the main and peak DPAs are much higher than at larger ACWs. This is due to the fact that the slope of the non-linearized ACW-AM curve significantly deceases as the as the ACW increases. The PDF of a QAM signal has its peak around the transition point, where the slope suddenly increases as the peak DPA turns on. Therefore, the RMS power of the quantization noise varies dynamically with variation in the signal's amplitude, leading to degradation of output spectral purity. In Fig. 5.10b, the effect of this phenomenon on the output spectrum is shown and compared with an ideal 10-bit quantizer, and a nonlinear 13-bit DPA after ideal DPD. Compensating for such nonidealities requires about 2-3 extra bits in the DPA and all of the preceding digital processing blocks, in- creasing the complexity, area and power consumption.



Figure 5.11: (a) EVM and ACPR of a 64-QAM signal vs. AM-PM and main-peak timing mismatch normalized to 1/BW, and (b) block diagram of AM-PM and main-peak timing mismatch correction in a digital polar TX with Doherty DPA.

# 5.4.2. AM - PM TIMING MISMATCH

The AM and PM signals in a polar TX are separated from each other. After the CORDIC block at the input, the baseband digital AM signal can be directly applied to the digital PA array, while the digital baseband phase data are first up-converted to the RF carrier signal by a phase modulator, thus becoming a passband signal, and then applied to the digital PA cells. Consequently, these two signals pass through totally different channels with different timing delays. Because of the bandwidth expansion of the AM and PM signals, any timing mismatch will significantly degrade the adjacent-channel-power-ratio (ACPR) and EVM. Increasing the input signal bandwidth makes it even more challenging to achieve good linearity since it directly increases the impact of the time alignment errors, as shown in Fig. 5.11a. For example, for a signal bandwidth of 32 MHz, the timing mismatch should be less than 100 ps to reach enough margin for ACPR<-50 dBc and EVM<-45dB after linearizion. Therefore, as shown in Fig. 5.11b, tunable delay cells should be used in the AM and/or PM signal paths to correct for the timing mismatch between them.

## 5.4.3. MAIN - PEAK TIMING MISMATCH

In a Doherty PA, the output signals of the main and peak DPA pass through transmission lines of different length, thus resulting in different delays, which can degrade the ACPR and EVM significantly. The simulated effect of such timing mismatch on the ACPR and EVM is shown in Fig. 5.11a. In a typical analog Doherty PA, as shown in Fig. 5.2a, the output of the main PA passes through a QWTL, while at the input, the input of the peak PA passes through a QWTL. Therefore, the overall input/output signals of the main and peak DPA are automatically self-aligned and there is ideally no timing mismatch between them (note that in practical implementations, the different impedance terminations of the lines can strongly degrade this property). However, in a digital Doherty PA as shown in Fig. 5.11b, while the phases of the carrier signals are corrected by applying a 90° phase offset, the output signals are not automatically self-aligned. Therefore, this delay difference should be compensated accurately, which can be done in the digital domain using fractional delays realized as FIR filters. Furthermore, as can be seen from Fig. 5.11a, the EVM and ACPR are more sensitive to main-peak timing mismatch than AM-PM timing mismatch.

# **5.5.** CIRCUIT-LEVEL LINEARIZATION

As explained in 5.4.1, a digital Doherty PA is in fact so nonlinear that even with ideal DPD, the nonlinearity lowers the effective number of bits, thus reducing the dynamic range of the output signal [16]. In this work, the digital PAs are made intrinsically linear by using three different circuit-level techniques: nonlinear sizing and overdrive-voltage control for ACW-AM correction, and multiphase RF clocking for ACW-PM correction [23–25]. Therefore, not only the burden on DPD to reach strict cellular wireless standards is reduced, but also the ACW-AM and ACW-PM distortions are corrected enough to pass the WiFi mask even without DPD. In the following section, these techniques are described in detail.

# 5.5.1. ACW-AM CORRECTION

In a conventional digital PA (Fig. 5.7a), the sub-PA cells in the array are sized linearly, meaning that as the input ACW increases, the effective width of the total active cells  $(W_{Eff})$  increases linearly. Linear sizing can result in substantial ACW-AM distortion as shown in Fig. 5.7b. Assuming a width of  $W_0$  for a unit cell, the effective width of the array is  $ACW.W_0$ . In this work, as shown in Fig. 5.12a, in order to linearize the ACW-AM con-



Figure 5.12: (a) Concept of nonlinear sizing to its full extent in a digital Doherty PA, and (b) the total width of the main and peak DPA vs. ACW along with the resulting linear ACW-AM curve.

version, the sub-PA cells in both the main and peak are sized nonlinearly, meaning that as the ACW increases, the effective size of the total active cells increases nonlinearly. Assuming an N-bit fully thermometer-coded array comprising  $2^N - 1$  cells, the transistors corresponding to small ACWs are sized smaller than  $W_0$ , and the transistors corresponding to large ACWs are sized larger than  $W_0$ . This yields a linear ACW-AM conversion, as shown in Fig. 5.12b. By calculating the inverse function of (5.6) for a main and peak DPA, and then scaling its maximum to the same total width of  $ACW_{MP,Max}$ . $W_0$ , the widths of the main DPA transistors corresponding to each  $ACW_M$  are initially calculated by:

$$W_{Eff,M,NL}[ACW] = \frac{ACW_M.W_0}{1 + 4K_{NL}(ACW_{MP,Max} - ACW_M)}$$
(5.7)

and the widths of the peak DPA transistors corresponding to each  $ACW_P$  are given by:

$$W_{Eff,P,NL}[ACW] = W_0 \frac{F_{WP}ACW_{MP,Max}K_{NL} + F_{WP}/4 - ACW_{MP,Max}K_{NL}/2}{ACW_{MP,Max}K_{NL}^2 - F_{WP}K_{NL}^2ACW_{MP,Max}}$$
(5.8)

where  $F_{WP}$  is given by:


Figure 5.13: (a) Concept of segmented nonlinear sizing shown for a digital PA, and (b) the resulting ACW-AM curve for a Doherty DPA with 8 segments in each main/peak DPA.

$$F_{WP} = ACW_P \frac{AM_{Norm}(ACW_{MP,Max}, ACW_{MP,Max}) - AM_{Norm}(ACW_{MP,Max}, 0)}{ACW_{MP,Max}}$$

$$+ AM_{Norm}(ACW_{MP,Max}, 0)$$
(5.9)

Due to the impact of other nonidealities, it is more practical to extract  $W_{Eff}$  by simulating the ACW-AM curve of a linearly sized digital PA, then calculating the inverse curve and scaling its maximum to  $ACW_{MP,Max}$ . W<sub>0</sub>. Using (5.7), the width of each transistor corresponding to each ACW is calculated by  $W_{Eff,NL}[ACW] - W_{Eff,NL}[ACW - 1]$ . Obviously, for a nonlinearly sized N-bit digital PA, this results in  $2^N - 1$  different transistors sizes, requiring fully thermometer coding, which results in high power consumption in the drivers stages. Therefore, in order to benefit from the well-known binary-unary segmentation [40] reduction of the array complexity and power consumption, segmented nonlinear sizing is used in this work. In a segmented nonlinearly sized digital PA, as shown in 5.13a, the array is divided into N segments with the same ACW range but different total sizes. While the effective size of the active cells inside each segment increases linearly, the overall effective size of the total active cells increases nonlinearly such that the resulting  $W_{Eff}[ACW]$  curve is a piece-wise linear version of the original fully nonlinearly sized  $W_{Eff}[ACW]$  curve. Since the cells inside each segment are sized linearly, it is possible to apply binary-unary segmentation to reduce the power consumed by the drivers. By increasing the number of the segments, the overall linearity improves. As it



Figure 5.14: Concept of multiphase RF clocking for ACW-PM correction shown for a nonlinearly sized Doherty DPA.

is shown in Chapter 4, 8 segments are enough to lower the ACPR and EVM to an acceptable level with enough margin for other sources of nonlinearity. The simulated ACW-AM curve of a segmented nonlinearly sized Doherty DPA with 8 segments in each main/peak DPAs is plotted in 5.13b, showing significant improvement in ACW-AM linearity over a Doherty DPA using uniformly sized arrays.

As can be seen from (5.7) and (5.8) for a nonlinearly sized array, the optimum sizes of the sub-PA cells depend on the nonlinearity factor  $K_{NL} = R_L/R_{D0}$ . However, after fabrication or during the operation of the chip, the load or frequency may change, which will change  $R_L$ . In addition, the process/voltage/temperature variations will change  $R_{D0}$ , thus changing  $K_{NL}$  from its desired design value. Consequently, the resulting ACW-AM curve will deviate somewhat from its optimum linearity in a practical implementation. To correct for this, we can tune  $K_{NL}$  by tuning the on-resistance of the transistors. Since  $R_{D0} = (W_0/L \times K_n \times V_{OD})^{-1}$ , we can tune  $K_{NL}$  by controlling the overdrive-voltage  $V_{OD}$ . To facilitate this, the VDD of the buffers that drive the output transistors is tuned. Therefore, since the peak voltage of the RF clock changes, changing the overdrive-voltage  $V_{OD} = V_{GS} - V_{TH}$ , the ACW-AM curve can be linearized back to its desired level.

#### 5.5.2. ACW-PM CORRECTION

In a conventional digital PA with single-phase RF clocking, all of the sub-PA cells are driven by the same (modulated) RF clock. In energy-efficient class-E PAs, in digital PAs which relies on reactive loading, the variation in the on-resistance of the transistor with ACW variations yields significant ACW-PM distortion, as shown in Fig. 5.9. To correct

for this, a concurrent multiphase RF clocking technique is used to reduce the ACW-PM conversion. For a single digital PA line-up, the resulting AM-PM curve using five multiphase RF clocks is shown in Fig. 5.14. In this technique, different phases of RF clocks are applied to the various segments of the digital PA array. The output currents of these segments are summed, thus the overall output phase is averaged, resulting in a considerable reduction in ACW-PM distortion.

The delayed RF clocks are generated by a bank of delay-lines. Since their delay can change due to PVT variations, or the ACW-PM curve itself can also change due to variations in the load or frequency, the delay lines are designed to be partly digitally programmable in order to compensate for the PVT / load / frequency variations. Once the ACW-PM is flattened, the normalized ACW-AM curve will be still almost identical to that of a single-phase nonlinearly sized digital PA. Therefore, no dynamic modification is needed for each ACW. Once the delay offsets are programmed they are fixed during the normal operation. The required phase offsets are roughly proportional to the phase error of each segment with respect to the output phase at maximum ACW. In practice, during the design process or chip operation, the delay offsets can be found using an iterative algorithm, as proposed in [25].

#### **5.6.** IMPLEMENTATION AND FABRICATION

#### 5.6.1. CMOS CHIPS

Since the load seen by the mean and peak DPA are not the same (except at peak power), their ACW-AM and ACW-PM curves are also different. Therefore, two chips with the same structure but different nonlinearly sized segments and delay offsets have been designed. The overall block diagram of the chips as well as the conceptual layout of the nonlinearly sized array are shown in Fig. 5.15. Since the sub-PA cells of the 8<sup>th</sup> segment are very large, they are implemented in two parallel rows, each with half the size of segment 8, as shown in Fig. 5.15b. The arrays of the main and peak DPAs are both 9-bits, each with a total width of 2.555 mm distributed over 8 segments with different sizings, as shown in Fig. 5.15c . Each segment consists of 16 thermometer-coded MSB cells and three LSB cells, which are 1/16 and 1/64 the total size of each segment, respectively.

In order to control the overdrive-voltage, a programmable on-chip low-drop-out (LDO) voltage regulator has been designed, as depicted in Fig. 5.16a. The input reference voltage of the LDO is controlled by a 6-bit R-2R digital-to-analog converter (DAC), while the output voltage of the LDO supplies the positive dc voltage of the buffers which drive



Figure 5.15: (a) Overall structure of main/peak DPA, (b) the conceptual layout of the nonlinearly sized array where only the output MSB transistors are shown, and (c) the realized sizing of the MSB sub-PA cells of the main and peak DPA compared to a conventional uniform array with the same total size.

the output transistors. In each chip, there is only one LDO for the entire array. The LDO is capable of driving 50 mA with a resolution of 10-12 mV. The input RF-clock and BB-clock are amplified by on-chip differential amplifiers and then converted to single-ended clocks. Although the input RF clock amplifier and the digital buffers are designed to have a 50% duty cycle, in practice due to the PVT variations, the duty cycle might change, degrading the output power/efficiency or linearity. Therefore, an on-chip 6-bit



Figure 5.16: (a) 6-bit programmable LDO for overdrive voltage tuning and the sub-PA unit cell circuit, and (b) 6-bit programmable duty-cycle correction (DCC) circuit.

programmable automatic duty cycle correction (DCC) circuit, shown in Fig. 5.16b, has been designed to compensate for such practical nonidealities. The DCC first monitors the dc voltage of the RF clock comparing it with a reference voltage supplied by a 6-bit R-2R DAC, then adjusts the dc voltage of the RF clock path. Because of the voltage clipping caused by the digital buffers, changing the offset voltage of the RF clock modifies the duty cycle within a control range of 33 %–66 %. The output of the DCC is applied to the multiphase RF clocking generator, which consists of 5 fine resolution single-ended delay lines. The outputs of the 1<sup>st</sup> to 5<sup>th</sup> delay offsets are applied to the segments 1–2, 3–4, 5–6, 7 and 8, respectively. The required resolution of delay offsets is less than 6 ps, which is realized with 4-bit programmable Vernier (relative) delay lines to cover the PVT / frequency / load variations [25]. The outputs of the delay lines are converted to differential signals before being applied to the digital PA array. Furthermore, clock gating is applied in the paths of the RF clocks to reduce the drivers' power consumption in the power backoff range.

In order to correct for the timing mismatch between the AM and PM paths, a digital 10-tap FIR filter is implemented on-chip as a fractional delay cell [41] in the path of the ACW data, as depicted in Fig. 5.15a. The coefficients of the filter are given by  $h[n] = sin[\pi(n - \Delta)]/[\pi(n - \Delta)]$ , in which *n* is the tap index and  $\Delta$  is the desired delay as a fractional of sampling time  $T_S = 1/F_S$ , which is the group delay of the FIR filter. For example, for a delay of 200 ps with a 500 MHz sampling rate, the impulse response (coefficients) and frequency response of the FIR filter are plotted in Fig. 5.17a and 5.17b. The chips are fabricated in 40 nm bulk CMOS. The core area of each DPA including the multiphase RF clocking and LDO blocks is 0.8 mm×0.3 mm. The die micrograph of the two chips (main and peak DPA) is shown in Fig. 5.18. The LDO settings, delay offsets,



Figure 5.17: (a) Impulse response (coefficients), and (b) the frequency response of the FIR-based digital fractional delay for a delay of 200 ps with a sampling rate of 500 MHz.



Figure 5.18: Die micrograph of the main and peak DPAs.

and coefficients of the FIR filter are programmed via a SPI interface. The input ACW data are also loaded via the SPI interface to an on-chip 4 K-sample SRAM memory. During normal operation, the stored ACW data words are read out in a loop to be fed to the DPA array using the BB clock.



Figure 5.19: (a) Conceptual structure of the proposed Doherty matching network, (b) connection of the DPA to the matching network, and (c) final realization of the proposed Doherty matching network.

#### 5.6.2. BALUN AND MATCHING NETWORK

In this work, the compensated impedance inverter is combined with a Marchand balun with re-entrant coupled lines to form the wideband load network of the proposed Doherty DPA, as depicted in Fig. 5.19a. The wire bonding structure connecting DPA chips to the matching network is shown in Fig. 5.19b. The re-entrant coupled lines are adopted

to achieve tight coupling without violating the stringent fabrication design rules. The design parameters for the re-entrant type coupled lines are  $\epsilon_{r1} = 10.2$ ,  $H_1 = 0.13$  mm and  $\epsilon_{r2} = 3.0$ ,  $H_2 = 0.75$  mm. The width of the top metal layer lines is  $W_1 = 1.5$  mm with S = 0.2 mm spacing, and the width of the middle metal layer is  $W_2 = 3.2$  mm, resulting in  $Z_{0e} = 71 \Omega$  and  $Z_{0o} = 7.5 \Omega$  impedances for the main DPA. The even- and odd-mode wavelengths at  $f_0 = 2.5$  GHz are around  $\lambda_{even} = 59 mm$  and  $\lambda_{odd} = 36 mm$ . The  $\lambda_{odd}/4$  and  $\lambda_{odd}/2$  re-entrant coupled (differential) TL sections are placed in front of the main and peak DPA, respectively (as described in Section 5.2.2), to create a wideband compensated impedance inverter, and also to connect them to the Marchand balun.

The Marchand balun is optimized to compensate for the non-perfect ground (via inductance) and port transitions. A  $\lambda/4$  transmission line is placed after the Marchand balun to match 50  $\Omega$ . Moreover, since creating blind vias (e.g., from the middle layer to the bottom), is not practical, two islands with a through via from the top metal layer to the bottom ground plate, as shown Fig. 5.19c, are placed at an optimized distance in front of the main and peak DPA to provide a second harmonic open impedance. Due to nonideal effects, in practical situations this distance is slightly different from  $\lambda_{even}/8$ . Furthermore, besides the use of the compensated  $\lambda_{odd}/4 \lambda_{odd}/2$  Doherty power combiner, the succeeding cascaded impedance-stepped TL sections, further increase the bandwidth.

#### **5.6.3.** OVERALL IMPLEMENTATION

The main and peak chips are mounted on an FR-4 PCB, while the Marchand balun is implemented separately on a two-layer Rogers material PCB, as shown in Fig. 5.20a. The top layer of the matching network is Rogers-3003 and the bottom layer Rogers-3010. Both of the PCBs are mounted on an FR-4 substrate as the base. The area of the matching network PCB is 41.4 mm×32 mm. The inductances of the shunt and series resonators are implemented by 3 and 4 parallel wire bonds respectively, as shown in Fig. 5.20b. Chip capacitors are used to complete the implementation of the series resonator, and to realize the decoupling capacitors of the dc feed. The assembly structure of the PCBs with DPA chips, wire bonds and the chip capacitors for the RF and dc feed connection and decoupling is shown in Fig. 5.20c. Transformer-based RF baluns are used on the FR-4 PCB to convert the single-ended clocks to differential ones before feeding them to the DPA chips.



(a)



Figure 5.20: (a) PCB photograph, (b) the structure of wire-bonding the DPA chips to the matching network, and (c) the side view of the dc feed/decoupling and RF path connections.



(a)



Figure 5.21: Measurement setups used for (a) static, and (b) dynamic measurements.

#### **5.7.** MEASUREMENT RESULTS

#### **5.7.1. STATIC MEASUREMENTS**

Two different measurement setups are used for the static and dynamic measurements as shown in Fig. 5.21a and Fig. 5.21b, respectively. In the static measurements, a signal generator with a power divider is used to provide the BB clock for both DPAs, and another signal generator with a hybrid power divider is to provide two RF clocks with a 90° phase difference for the main and peak DPAs. The BB clock frequency is 500 MHz, which is limited by the readout speed of the SRAM used to store the data, while the RF clock



Figure 5.22: Measured (a)  $P_{OUT}$  at full power and power backoff for different  $VDD_{Main}$ , (b) drain efficiency with  $VDD_{Main} = 0.6 V$  vs. center frequency, (c) drain efficiency, and (d) power-added efficiency with  $VDD_{Main} = 0.6 V$  vs. output power.

varies between 2-3 GHz.

#### **POWER/EFFICIENCY MEASUREMENT**

The output power ( $P_{OUT}$ ), drain efficiency (DE) and power-added efficiency (PAE) are measured with different VDDs ranging from 0.5 V to 0.7 V, for CW output signals over the frequency range of 2 GHz–3 GHz, both at full power ( $ACW_M = ACW_P = 511$ ) and back-off power ( $ACW_M = 511$ ,  $ACW_P = 0$ ). The ACW data are generated in MATLAB and then loaded into the on-chip SRAMs. The output power is measured using a power meter. The



Figure 5.23: Measured static (a) ACW-AM, and (b) ACW-PM at  $F_C$  = 2.5 GHz without using DPD, compared to a simulated digital Doherty PA using conventional uniformly sized output stages.

PAE includes the power consumption of all the main building blocks on the chips, such as the power, sub-PA drivers, digital decoder/encoders, multiphase RF clocking circuit, DCC and LDO. The measured peak and backoff output power over the 2 GHz–3 GHz range are shown in Fig. 5.22a, which range from 16 dBm to 19 dBm, and 10.6 dBm to 13.6 dBm, respectively. The measured peak and backoff DE over the 2 GHz–3 GHz range are shown in Fig. 5.22b. As can be seen, DE is more than 50% within the 2.35 - 2.8 GHz frequency range. The efficiency at the backoff power level is within 10% of its maximum value over a 750 MHz span, while equivalent to 30% relative bandwidth. At  $F_C$  = 2.4 GHz with VDD<sub>Main</sub> = 0.6 V and VDD<sub>Peak</sub> = 0.7 V, the peak/backoff DE and PAE are 57 % / 52 % and 36 % / 25 %, respectively. The DE and PAE are plotted versus output power in Fig. 5.22c and Fig. 5.22d, showing a well-shaped Doherty efficiency curve.

#### LINEARITY MEASUREMENT

Using a similar measurement setup, the static linearity is measured using a spectrum analyzer at the output to down-convert and digitize the output signal to digital baseband. Since the input signal to the DPAs is digital, it is easy to generate a perfect quantized triangle (or ramp) signal for measuring the ACW-M and ACW-PM conversion curves. For this purpose, a 4096-sample triangle signal is generated in MATLAB, from which the main  $ACW_M$  and peak  $ACW_P$  signals are created, then loaded into the on-chip SRAMs. These SRAMs are read out in a loop with a 500 MHz clock frequency, creating a 122.07 KHz



Figure 5.24: Measured results of a 16 MHz OFDM signal at  $F_C = 2.5$  GHz: output spectrum (a) without DPD, and (b) with ILC DPD; (c) ACW-AM with and without ILC DPD, and (d) ACW-PM with and without ILC DPD .

triangle waveform as the input signal for the DPA branches. The digital down-converted output data are processed in MATLAB to extract the ACW-AM and ACW-PM curves. The delay mismatch between the main and peak branches is also measured. The integer part is compensated in MATLAB, while the fractional part is programmed into the chips. The resulting static linearity curves are measured at  $F_C = 2.5$  GHz. These results are plotted in Fig. 5.23. As can be seen, compared to a Doherty DPA with conventional segmentation, the proposed Doherty DPA shows a significant improvement in the linearity without using any DPD.



Figure 5.25: Measured spectrum of a 32 MHz OFDM signal at  $F_C$  = 2.5 GHz with ILC DPD.

#### **5.7.2.** MODULATED SIGNAL MEASUREMENTS

The proposed Doherty DPA is also measured with modulated signals using the measurement setup shown in Fig. 5.21b. The input I/Q signal is converted to digital AM and PM signals in MATLAB with  $F_S$  =500 MHz. A 12 GSa / s arbitrary waveform generator (AWG) is used to generate phase modulated RF signals. For this purpose, the phase data are up-converted in MATLAB to a 2.5 GHz sine-wave and then loaded into the AWG. The AM data are converted to  $ACW_M$  and  $ACW_P$ , and loaded into the on-chip SRAM memories running at 500 MHz. The BB clocks are generated using a signal generator with a power divider. Similar to the static measurements, the delay mismatch between the main and peak branches as well as the AM and PM signals are compensated both in MATLAB for the integer part, and in the on-chip FIR filters for the fractional part. The output spectrum of a 16 MHz OFDM with PAPR = 8.1 dB is measured with and without using DPD as shown in Fig. 5.24a and Fig. 5.24b. The measured ACPR and EVM without DPD are -41 dBc and -36 dB, respectively. By using a simple DPD based on iterative learning control (ILC) with an LUT [16] (which will be described in detail in Chapter 6), the measured ACPR and EVM are -52 dBc and -50 dB, respectively. The measured ACW-AM and ACW-PM curves of the 16 MHZ OFDM signal, with and without ILC DPD, are shown in Fig. 5.24c and Fig. 5.24d. The output spectrum of a 32 MHz OFDM signal is also measured with ILC DPD, as shown in Fig. 5.25. The measured ACPR and EVM with ILC DPD are -48 dBc and -48 dB, respectively. Table 5.1 summarizes and compares the performance of this work with the prior art digital Doherty PAs.

#### **5.8.** CONCLUSION

A highly linear wideband class-E CMOS digital Doherty power amplifier is presented. Closed-form equations are extracted to predict the ACW-AM and ACW-PM curves. By using a wideband Marchand balun with re-entrant coupled lines for the output matching network, more than 50 % drain efficiency (DE) at a  $\sim$ 6 dB power backoff (PBO) level within the 2.35 - 2.8 GHz frequency range is achieved. The drain efficiency at 6dB-PBO is within 10% of its maximum value over a 750 MHz span, which is equivalent to 30% relative bandwidth. The measured peak / 6 dB-PBO  $P_{OUT}$ , DE and PAE at 2.4 GHz are 17.5 / 12.2 dBm, 57 / 52 % and 36 / 25 % with VDD Main / Peak = 0.6 / 0.7 V. The linearity is significantly improved by the nonlinear sizing of the DPA arrays, along with overdrivevoltage control, and concurrent multiphase RF clocking, as well as accurately compensating the time/phase mismatch between the main and peak branches and the AM and PM signals. In order to achieve maximum intrinsic linearity, two different chips with the same architecture, but with different design parameters, are fabricated as the main and peak amplifiers. Measured results show a -41 dBc ACPR and -36 dB EVM for a 16 MHz OFDM signal at 2.5GHz without using DPD. By using DPD, the measured ACPR and EVM of the 16 MHz OFDM signal are -52 dBc and -50 dB, respectively. For a 32 MHz OFDM signal, the measured ACPR and EVM are -48 dBc and -48 dB, respectively.

The proposed concept in this work is scalable to higher power levels. Future versions will include on-chip phase-modulators and complete (both integer and fractional) delay calibration blocks, eliminating the need for any off-chip phase modulation or signal processing.

#### REFERENCES

- M. Hashemi, L. Zhou, Y. Shen, and L. C. N. de Vreede, *A Highly Linear Wideband Po*lar Class-E CMOS Digital Doherty Power Amplifier, IEEE Transactions on Microwave Theory and Techniques 67, 4232 (2019).
- [2] S. Cripps, *RF Power Amplifiers for Wireless Communications*, Artech House microwave library (Artech House, 2006).
- [3] G. Hanington, P.-F. Chen, P. M. Asbeck, and L. E. Larson, High-efficiency power am-

|  | Matching Network  | ACPR (dBc) / EVM (dB) | Modulated Signal BW (MHz) | Efficiency Improvement* @6dB PBO (%) | DE / PAE @ 6dB PBO (%) | DE / PAE @ Psat (%) | Psat (dBm)     | VDD (V)   | Frequency (GHz) | Technology     |           |  |
|--|-------------------|-----------------------|---------------------------|--------------------------------------|------------------------|---------------------|----------------|-----------|-----------------|----------------|-----------|--|
|  | Off-Chip          | -52 / -50             | 16 (OFDM)                 | 92                                   | 52 / 25                | 54 /                | 17             | 0.6 / 0.7 | 2.5             | CMOS 40nm      | This Work |  |
|  |                   | -48 / -48             | 32 (OFDM)                 |                                      |                        | 34                  | <del>.</del> ज |           |                 |                |           |  |
|  | Off-Chip          | -45 / -35             | 40 (802.11ac)             | 51                                   | N.A. / 34              | N.A. / 45           | 24             | 2.4       | 0.9             | CMOS 65nm (LP) | [14]      |  |
|  | On-Chip           | -35** / -30           | 20 (64QAM)                | 36                                   | 34 / N.A.              | 50 / N.A.           | 21.4           | 0.7       | 2.5             | CMOS 40nm      | [13]      |  |
|  | On-Chip           | -28 / -27             | 0.5 (16QAM)               | 40                                   | 21 / N.A.              | 30 / 26             | 27             | ω         | 3.5             | CMOS 65nm      | [12]      |  |
|  | Partially On-Chip | -25 / -25             | 20 (OFDM)                 | 6                                    | N. A. / 27**           | N. A. / 51          | 32             | 3.3       | 2.4             | CMOS 130nm     | [11]      |  |

Table 5.1: Performance Summary and Comparison Table with Prior Art Digital Doherty PAs

\* Ratio of DE @6dB PBO to the DE of a normalized class-B PA @6dB PBO.

\*\* Estimated graphically.

*plifier using dynamic power-supply voltage for CDMA applications*, IEEE Transactions on Microwave Theory and Techniques **47**, 1471 (1999).

- [4] J. Staudinger, B. Gilsdorf, D. Newman, G. Norris, G. Sadowniczak, R. Sherman, and T. Quach, *High efficiency CDMA RF power amplifier using dynamic envelope tracking technique*, in 2000 IEEE MTT-S International Microwave Symposium Digest (Cat. *No.00CH37017*), Vol. 2 (2000) pp. 873–876 vol.2.
- [5] W. H. Doherty, A New High Efficiency Power Amplifier for Modulated Waves, Proceedings of the Institute of Radio Engineers 24, 1163 (1936).
- [6] F. H. Raab, *Efficiency of Doherty RF Power-Amplifier Systems*, IEEE Transactions on Broadcasting BC-33, 77 (1987).
- [7] R. J. McMorrow, D. M. Upton, and P. R. Maloney, *The microwave Doherty amplifier*, in 1994 IEEE MTT-S International Microwave Symposium Digest (Cat. No.94CH3389-4) (1994) pp. 1653–1656 vol.3.
- [8] J. H. Qureshi, N. Li, W. C. E. Neo, F. van Rijs, I. Blednov, and L. C. N. de Vreede, A wide-band 20W LMOS Doherty power amplifier, in 2010 IEEE MTT-S International Microwave Symposium (2010) pp. 1504–1507.
- [9] A. Grebennikov and J. Wong, A Dual-Band Parallel Doherty Power Amplifier for Wireless Applications, IEEE Transactions on Microwave Theory and Techniques 60, 3214 (2012).
- [10] W. Wu, R. B. Staszewski, and J. R. Long, A 56.4-to-63.4 GHz Multi-Rate All-Digital Fractional-N PLL for FMCW Radar Applications in 65 nm CMOS, IEEE Journal of Solid-State Circuits 49, 1081 (2014).
- [11] N. Ryu, S. Jang, K. C. Lee, and Y. Jeong, CMOS Doherty Amplifier With Variable Balun Transformer and Adaptive Bias Control for Wireless LAN Application, IEEE Journal of Solid-State Circuits 49, 1356 (2014).
- [12] S. Hu, S. Kousai, J. S. Park, O. L. Chlieh, and H. Wang, *Design of A Transformer-Based Reconfigurable Digital Polar Doherty Power Amplifier Fully Integrated in Bulk CMOS*, IEEE Journal of Solid-State Circuits 50, 1094 (2015).
- [13] Y. Shen, M. Mehrpoo, M. Hashemi, M. Polushkin, L. Zhou, M. Acar, R. van Leuken, M. S. Alavi, and L. de Vreede, *A fully-integrated digital-intensive polar Doherty*

*transmitter,* in 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC) (2017) pp. 196–199.

- [14] V. Vorapipat, C. S. Levy, and P. M. Asbeck, *Voltage Mode Doherty Power Amplifier*, IEEE Journal of Solid-State Circuits 52, 1295 (2017).
- [15] A. Cidronali, S. Maddio, N. Giovannelli, and G. Collodi, Frequency Analysis and Multiline Implementation of Compensated Impedance Inverter for Wideband Doherty High-Power Amplifier Design, IEEE Transactions on Microwave Theory and Techniques 64, 1359 (2016).
- [16] M. Hashemi, M. S. Alavi, and L. C. N. De Vreede, Pushing the Linearity Limits of a Digital Polar Transmitter, in 2018 13th European Microwave Integrated Circuits Conference (EuMIC) (2018) pp. 174–177.
- [17] L. R. Kahn, Single-Sideband Transmission by Envelope Elimination and Restoration, Proceedings of the IRE 40, 803 (1952).
- [18] K. Oishi, E. Yoshida, Y. Sakai, H. Takauchi, Y. Kawano, N. Shirai, H. Kano, M. Kudo, T. Murakami, T. Tamura, S. Kawai, K. Suto, H. Yamazaki, and T. Mori, A 1.95 GHz Fully Integrated Envelope Elimination and Restoration CMOS Power Amplifier Using Timing Alignment Technique for WCDMA and LTE, IEEE Journal of Solid-State Circuits 49, 2915 (2014).
- [19] P. T. M. van Zeijl and M. Collados, A Digital Envelope Modulator for a WLAN OFDM Polar Transmitter in 90 nm CMOS, IEEE J. of Solid-State Circuits 42, 2204 (2007).
- [20] A. Kavousian, D. K. Su, M. Hekmat, A. Shirvani, and B. A. Wooley, A Digitally Modulated Polar CMOS Power Amplifier With a 20-MHz Channel Bandwidth, IEEE J. of Solid-State Circuits 43, 2251 (2008).
- [21] D. Chowdhury, L. Ye, E. Alon, and A. Niknejad, An Efficient Mixed-Signal 2.4-GHz Polar Power Amplifier in 65-nm CMOS Technology, IEEE J. of Solid-State Circuits 46, 1796 (2011).
- [22] L. Ye, J. Chen, L. Kong, E. Alon, and A. Niknejad, Design Considerations for a Direct Digitally Modulated WLAN Transmitter With Integrated Phase Path and Dynamic Impedance Modulation, IEEE J. of Solid-State Circuits 48, 3160 (2013).

- [23] M. Hashemi, Y. Shen, M. Mehrpoo, M. Acar, R. van Leuken, M. S. Alavi, and L. de Vreede, An Intrinsically Linear Wideband Digital Polar PA Featuring AM-AM and AM-PM Corrections Through Nonlinear Sizing, Overdrive-Voltage Control, and Multiphase RF Clocking, in IEEE ISSCC Dig. Tech. Papers (ISSCC) (2017) pp. 300–301.
- [24] M. Hashemi, L. Zhou, Y. Shen, M. Mehrpoo, and L. de Vreede, *Highly efficient* and linear class-E CMOS digital power amplifier using a compensated Marchand balun and circuit-level linearization achieving 67% peak DE and 40 dBc ACLR without DPD, in 2017 IEEE MTT-S International Microwave Symposium (IMS) (2017) pp. 2025–2028.
- [25] M. Hashemi, Y. Shen, M. Mehrpoo, M. S. Alavi, and L. C. N. de Vreede, *An Intrin-sically Linear Wideband Polar Digital Power Amplifier*, IEEE Journal of Solid-State Circuits 52, 3312 (2017).
- [26] A. Grebennikov, RF and Microwave Power Amplifier Design (McGraw-Hill, 2004).
- [27] J. H. Qureshi, W. Sneijers, R. Keenan, L. C. N. deVreede, and F. van Rijs, A 700-W peak ultra-wideband broadcast Doherty amplifier, in 2014 IEEE MTT-S International Microwave Symposium (IMS2014) (2014) pp. 1–4.
- [28] A. M. Pavio and S. K. Sutton, A microstrip re-entrant mode quadrature coupler for hybrid and monolithic circuit applications, in IEEE International Digest on Microwave Symposium (1990) pp. 573–576 vol.1.
- [29] C. M. Tsai and K. C. Gupta, CAD procedures for planar re-entrant type couplers and three-line baluns, in 1993 IEEE MTT-S International Microwave Symposium Digest (1993) pp. 1013–1016 vol.2.
- [30] R. K. Mongia, B. I. J., and P. Bhartia, *RF and Microwave Coupled-Line Circuits* (Artech House, 2007).
- [31] G. D. Ewing, *High-Efficiency Radio-Frequency Power Amplifiers*, Ph.D. thesis, Dept. Elect. Eng., Oregon State University (1964).
- [32] N. O. Sokal and A. D. Sokal, Class E-A New Class of High-Efficiency Tuned Single-Ended Switching Power Amplifiers, IEEE J. of Solid-State Circuits 10, 168 (1975).
- [33] F. H. Raab, *Class-E, Class-C, and Class-F Power Amplifiers based upon a Finite Number of Harmonics*, IEEE Trans. on Microw. Theory Techn. **49**, 1462 (2001).

- [34] S. D. Kee, I. Aoki, A. Hajimiri, and D. Rutledge, *The Class-E/F Family of ZVS Switching Amplifiers*, IEEE Trans. on Microw. Theory Techn. **51**, 1677 (2003).
- [35] N. Kumar, C. Prakash, A. Grebennikov, and A. Mediano, *High-Efficiency Broadband Parallel-Circuit Class E RF Power Amplifier With Reactance-Compensation Technique*, IEEE Transactions on Microwave Theory and Techniques **56**, 604 (2008).
- [36] M. Acar, A. J. Annema, and B. Nauta, *Analytical Design Equations for Class-E Power Amplifiers*, IEEE Trans. on Circuits and Systems I: Regular Papers **54**, 2706 (2007).
- [37] P. Tsai, Y. Lin, J. Kuo, Z. Tsai, and H. Wang, Broadband Balanced Frequency Doublers With Fundamental Rejection Enhancement Using a Novel Compensated Marchand Balun, IEEE Transactions on Microwave Theory and Techniques 61, 1913 (2013).
- [38] P. Wright, J. Lees, J. Benedikt, P. J. Tasker, and S. C. Cripps, A Methodology for Realizing High Efficiency Class-J in a Linear and Broadband PA, IEEE Transactions on Microwave Theory and Techniques 57, 3196 (2009).
- [39] S. Rezaei, L. Belostotski, F. M. Ghannouchi, and P. Aflaki, *Integrated Design of a Class-J Power Amplifier*, IEEE Transactions on Microwave Theory and Techniques 61, 1639 (2013).
- [40] C.-H. Lin and K. Bult, A 10-b, 500-MSample/s CMOS DAC in 0.6 mm2, IEEE Journal of Solid-State Circuits 33, 1948 (1998).
- [41] A. Oppenheim and R. Schafer, *Discrete-time Signal Processing*, Prentice-Hall signal processing series (Prentice Hall, 1989).

## 6

### DIGITAL PREDISTORTION AND SYSTEM-LEVEL CONSIDERATIONS TO FURTHER PUSH THE LINEARITY LIMITS OF A POLAR DPA

Most parts of this chapter have been published in the EuMIC conference, September 2018, Madrid, Spain [1].

#### **6.1.** INTRODUCTION

I N previous chapters, we have shown that compared to a conventional DPA design, by using the novel circuit-level linearization techniques as depicted in Fig. 6.1, a DPA can be designed to be linear enough such that it can operate without DPD while still meeting the spectral masks of wireless applications. However, considering the DPA as an RF digital-to-analog converter (RFDAC), one may wonder: what are the minimum EVM and ACPR that are achievable even with an ideally linear DPA, and how can we reach those limits?

In this chapter, first, the most significant limiting factors of the EVM and ACPR in a DPA are discussed, and the approach to push further improve them is explained. Second, it is shown how to optimally predistort a DPA using the iterative learning control (ILC) technique [2] to reach the theoretical limits given by the quantization noise in combination with other system- and circuit-level techniques. Finally, based on the ILC technique, a novel real- time direct-learning DPD technique is proposed, which in contrast to conventional direct-learning DPD, extracts its parameters directly using the LS algorithm. The latter has been experimentally verified by measurement results.

#### **6.2.** LINEARITY LIMITS OF A POLAR DPA

A switch-mode DPA typically shows significant static nonlinearities, which normally can be corrected using on-chip LUT-DPD. Compared to an analog PA, in a well designed DPA, the memory effects are typically smaller, since only the bias connection of the final stage tends to contribute to these memory effects. Nevertheless, besides their dynamic and static nonlinearities, there are four other limiting factors in a digital polar TX, namely: *bandwidth expansion* of the AM and PM paths, *timing mismatch* between the AM and PM paths, *aliasing of residual sampling spectral replicas* of the AM and PM signals, and *nonuniform quantization noise*. The first two factors are shared with a conventional analog polar TX; their impact on linearity and maximum achievable bandwidth have been discussed in Chapter 2. For example, as depicted in Fig. 1.6c, the normalized AM-PM mismatch for an EVM and ACPR below -60dB/dBc should be less than 0.11%, meaning that for a 20 MHz bandwidth, the timing mismatch should be better than 55 ps. The effect of the latter two will be discussed in the following sections.



Figure 6.1: (a) Conventional digital Polar TX with a switch-mode DPA showing ACW-AM and ACW-PM nonlinearities, and (b) a linearized DPA using nonlinear sizing, overdrive-voltage control, and multiphase RF clocking.

#### 6.2.1. Aliasing of the Residual Sampling Spectral Replicas

In a digital polar TX, since there is no explicit reconstruction baseband filter other than the natural zero-order-hold (ZOH) operation of the RFDACs, the residual SSRs of the baseband AM and PM signals will mix with the RF carrier and its harmonics, folding back into the desired band. This phenomenon is shown in Fig. 6.2a. For example, with a sampling frequency ( $F_S$ ) equal to the carrier frequency ( $F_C$ ), the residuals of the 2<sup>nd</sup> and 4<sup>th</sup> SSR will mix with the 3<sup>rd</sup> harmonic of the carrier and map directly to  $F_C$ . By decreasing  $F_S$  to  $F_C/4$ , SSRs smaller than the 8<sup>th</sup> ones will not fold back by mixing with the odd harmonics. On the other hand, further decreasing  $F_S$  leads to the aliasing of the skirt of the 1<sup>st</sup> SSR (with expanded bandwidth) to the desired frequency band, as shown in Fig. 6.2b for wideband signals (BW=128-256 MHz). By filtering the SSRs of the PM signal, the ACPR and EVM improve, as shown in Fig. 6.2c. Therefore, in this work, in order to achieve the maximum spectral and in-band purity, SSRs of the PM signal are removed by up-sampling and filtering the phase in the digital domain. Nonetheless, by carefully selecting  $F_S$  for a given  $F_C$ , it is possible to minimize the impact of the aliasing SSRs on the achievable spectral purity. Even for narrowband (e.g. 16 MHz) signals, the ACPR and EVM improve by up to 3-4 dB after filtering the SSRs of the PM signal. Further improvement will be achieved by using DPD and precise AM-PM delay matching.

#### **6.2.2.** NONUNIFORM QUANTIZATION NOISE

Considering the DPA as an RFDAC, its output quantization noise defines the limit on the accuracy of the transmitted signal, impacting both the ACPR and EVM. For an ideally linear RFDAC with *N*-bits resolution and an  $F_S$  sampling rate, the quantization noise (QN) within the bandwidth *BW* can be calculated by [3]:

$$QN(dBc) = -6.02N - 1.76 + PAPR - 10\log\left(\frac{F_S}{BW}\right)$$
(6.1)

where PAPR (dB) is the peak-to-average power ratio. In an ideal RFDAC, EVM is almost equal to the quantization noise [4]. Equation 6.1 assumes that the quantization noise is uniform versus the input signal amplitude. However, in a conventional energy-efficient nonlinear DPA, the quantization error is a different value for each AM codeword (see Fig. 6.3c). For example, as depicted conceptually in Fig. 6.3a for a low resolution DPA, the normalized transfer function at ACW = 1 is ~ 3× bigger than a linear DPA, resulting in  $\sim 10 \, dB$  loss of dynamic range. This manifests itself in the output spectrum by increasing the in-band and close-in, out-of-band noise floor, as shown in Fig. 6.3b. Therefore, even with ideal DPD, the ACPR and EVM will be higher than the value expected from 6.1. However, by using the aforementioned proposed circuit-level linearization techniques, the quantization noise of a DPA can be made almost uniform, as shown in Fig. 6.3c, allowing a minimum ACPR and EVM to be achieved in combination with a proper DPD. Figure 6.4 shows the simulated ACPR and EVM as well as the calculated ON(dBc) for a 16MHz OFDM signal with PAPR = 9.2 dB, up-sampled to  $F_S = 500$  MHz (using a rootraised-cosine (RCC) FIR filter with  $\beta = 0.1$ , Span = 50), and quantized by an ideal N-bit DPA. For example, with N = 9-bits, the ideal EVM/ACPR is about -61.7 dB/dBc for a 16MHz OFDM signal.



(a)



(c)

Figure 6.2: (a) Residual sampling spectral replicas (SSRs) of AM and PM signals, (b) simulated output spectrum of a digital polar TX with  $F_C = 2$  GHz,  $BW_0=128$  MHz, and a square wave carrier signal before and (c) after filtering the SSRs of the PM signal.

Frequency



Figure 6.3: (a) Exaggerated ACW-AM curves of a nonlinear DPA, a linearized nonlinear DPA w/ DPD as a nonuniform quantizer, and an intrinsically linear DPA as a uniform quantizer, (b) resultant simulated output spectra, and (c) normalized quantization error (LSB) of a nonlinear DPA with DPD and the proposed intrinsically linear DPA.



Figure 6.4: Simulated ACPR and EVM as well as the calculated QN for a 16MHz OFDM signal up-sampled to  $F_S = 500$  MHz with *PAPR* = 9.2 dB.



Figure 6.5: (a) Concept of the ILC technique with an almost linear DPA, and (b) block diagram of the ILC technique assisted by LUTs for fast convergence.

#### **6.3.** LINEARIZATION OF DIGITAL POLAR TRANSMITTERS

#### 6.3.1. LINEARIZATION USING ILC WITH LUTS

The goal of predistortion is to modify the input signal of a PA such that the resultant output signal is a linearly scaled copy of the TX input signal. However, as mentioned in Chapter 3, this is not a trivial problem because initially we do not know exactly what should be the output of the DPD (i.e. input of the PA) to make sure that the baseband equivalent of the PA output will be a scaled copy of the original input signal. If we had the ideal output signal of the DPD, then this would become a classic identification problem which could be solved easily using the LS algorithm.

Here, a novel DPD technique is proposed which solves this problem by finding the expected output signal of the DPD so that the LS algorithm can be used to estimate the DPD model parameters. This DPD is based on the ILC technique, which is employed to find the input of a system by iteratively subtracting the feedback error signal from the input until the desired output is generated within an acceptable range. Consequently, with the optimum input and output, the parameters of the mathematical DPD can be simply estimated using the LS algorithm.

In this work, static LUTs are used to increase the convergence rate of the ILC technique. The concept and block diagram of this technique is shown in Fig. 6.5. For a given sequence of a TX input signal  $X = [x(0) \ x(1)... \ x(N)]^T$ , the optimum input signal  $U = [u(0) \ u(1)... \ u(N)]^T$  to the LUT is found after a few iterations until the normalized output signal  $Y = [y(0) \ y(1)... \ y(N)]^T$  is equal to X with an acceptable accuracy. The combination of LUT-DPD and a DPA can be modeled as follows:

$$Y = U + E(U), \quad U_0 = X,$$
 (6.2)

where E(U) is the error caused by distortion. Thanks to the LUTs, the error signal is much smaller than the desired signal. Consequently, we assume that the DPA gain is normalized and any change in the input signal appears almost equal at the output. Therefore, after one iteration, we obtain:

$$U_1 = U_0 - E(U_0) \Rightarrow Y_1 = X - E(X) + E(X - E(X)),$$
(6.3)

Since the distortion error *E* is much smaller than *X*, *E*(*X*) and *E*(*X* – *E*(*X*)) are almost equal, we obtain  $Y_1 \approx X$ . By iteratively calculating the error signal and subtracting it from the TX input signal *X*, we obtain the DPA input signal  $U_K$  after *K* iterations as follows:

$$U_{K} = \left(\sum_{i=0}^{K} \mu^{i}\right) X - \sum_{i=0}^{K-1} \mu^{K-i} Y_{i},$$
(6.4)

where  $\mu$  is the convergence gain. For a highly nonlinear DPA, the ILC algorithm assisted by LUTs reaches the desired ACPR / EVM performance after 2-3 iterations, compared to the basic ILC algorithms [5, 6], which needs 4-5 iterations.

#### 6.3.2. PROPOSED ILC-INSPIRED DIRECT-LEARNING DPD

Since the ILC algorithm is an offline linearization technique, therefore, a new real-time direct-learning DPD technique, based on 6.4 is proposed. In this approach, the behavioral model of LUT-DPD and a DPA together are estimated based on a GMP model [7].



Figure 6.6: Block diagram of the proposed DPD: (a) general, and (b) simplified form.

An LS algorithm estimates the model parameters by the following:

$$A_i = (M_i^H M_i)^{-1} M_i^H Y_i, (6.5)$$

where  $M_i$  is the basis function matrix of signal vector U in the  $i^{th}$  iteration. Compared to other DPD algorithms, this technique does not introduce more computational complexities, as it uses the same conventional techniques to find the pseudo-inverse matrix  $(M_i^H M_i)^{-1} M_i^H$ . In practice, solving the system of equations of  $M_i^H M_i A_i = M_i^H Y_i$  is faster and more power-efficient than computing the pseudo-inverse of  $M_i$  to find  $A_i$  [7].

After the LUT-DPD, the phase signal is again up-sampled and interpolated in the digital domain to remove the SSRs of PM path. Next, in the DPD blocks, the GMP model parameters of the combined LUT+DPA are updated and the output signal is estimated as  $y_{E,i}$ , which is equivalent to the *i*<sup>th</sup> iteration of the ILC algorithm. The block diagram of the proposed DPD is shown in Fig. 6.6a. Each LUT+DPA model block is linked to an iteration step in the ILC algorithm. Therefore, the model parameters can be slightly different from each other, since they are estimated from different input/output signals. However, for simplicity, the same DPA model can be used for all of the cascaded blocks. Furthermore, the convergence gain  $\mu$  is set to 1 for a faster convergence. In this case, the pro-



Figure 6.7: Conceptual block diagrams of (a) conventional direct-learning DPD and (b) conventional indirect-learning DPD compared to (c) the proposed direct-learning DPD.

posed DPD can be modeled as:

$$u_K[n] = (K+1)x[n] - \sum_{i=0}^{K-1} y_{E,i}[n].$$
(6.6)

Consequently, thanks to the nonlinearity order reduction from the LUT-DPD and DPA combination, only one DPA-model block is sufficient. Therefore, the proposed DPD is simply modeled as follows:

$$u[n] = 2x[n] - y_E[n].$$
(6.7)

The simplified structure of the proposed DPD is shown in Fig. 6.6b. The ILC-based DPDs in [5, 6] need two steps: first to find the optimum output of DPD interactively by applying the ILC technique, and second to extract the DPD model parameters. In contrast, the proposed DPD in this work only needs one step, which is basically extracting the PA model parameters using the LS algorithm and then directly using it in the DPD. Furthermore, as depicted conceptually in Fig. 6.7c, the proposed DPD is a direct-learning DPD in which, in contrast to conventional indirect-learning DPDs (Fig. 6.7a), the DPD parameters are directly extracted using the LS algorithm similar to an indirect-learning DPD (see Fig. 6.7b), but without the accuracy penalty of applying the post-inverse of the PA instead of its pre-inverse.

#### **6.4.** MEASUREMENT RESULTS

To test the proposed DPD approaches, a digital polar nonlinearly-sized CMOS DPA with multiphase RF clocking is tested in the measurement setup shown in Fig. 6.8. A Keysight M8190A 12 GSa/s AWG is used for generating both the RF PM signal and the baseband sampling clock. The input I/Q signals are converted to amplitude and phase in MATLAB with  $F_S = 500$  MHz. Next, the phase data are up-sampled, and the SSRs are filtered. Then



Figure 6.8: Measurement Setup.



Figure 6.9: Measured output spectrum of (a) 16 MHz OFDM signal with LUT-DPD and offline ILC-DPD after two iterations, and (b) 64 MHz OFDM signal with the proposed real-time DPD.

the PM signal is up-converted to  $F_C = 2$  GHz in the digital domain, and loaded into the M8190A AWG. The ACW data are stored in an on-chip 4K-SRAM memory. The AM-PM timing mismatch is corrected by an on-chip FIR Sinc filter and delay lines [8]. The output signal is down-converted and digitized using R&S-FSW and downloaded onto a PC. The DPD algorithms are implemented in MATLAB. In Fig.6.9a, the measured output spectrum of a 16 MHz OFDM signal using LUT-DPD with offline ILC-DPD after two iterations is shown. In Fig.6.9b, the wide-span measured output spectrum of a 64 MHz OFDM signal using the proposed real-time DPD is shown. To compare the linearity performance, the output spectrum, AM-AM (gain error), and AM-PM (phase error) curves are measured for 16 MHz and 64 MHz OFDM signals using no DPD, LUT-DPD, offline ILC-DPD, and the proposed real-time DPD. The measured results of the output spectrum, AM-AM, and AM-PM curves are shown in Fig. 6.10, Fig. 6.11, and Fig. 6.12, respectively. For



Figure 6.10: Measured output spectrum of (a) 16 MHz OFDM and (b) 64 MHz OFDM signals.



Figure 6.11: Measured AM-AM (gain error) of (a) 16 MHz OFDM and (b) 64 MHz OFDM signals.

the 16 MHz signal, the offline ILC algorithm results in -60 dBc ACPR and -60 dB EVM, and the proposed DPD results in -55 dBc ACPR and -50 dB EVM, as shown in Fig. 6.10a. The ACPR and EVM are also measured for the 16 MHz signal without filtering the PM SSRs using the offline ILC technique, which results in -57 dBc ACPR and -56 dB EVM, both of which are degraded by the SSR phenomenon. For the 64 MHz OFDM signal, the offline ILC algorithm results in -53 dBc ACPR and -47 dB EVM, and the proposed DPD results in -48 dBc ACPR and -44 dB EVM, as shown in Fig. 6.10b. Table 6.1 compares the performance of different DPD techniques used in this work.



Figure 6.12: Measured AM-PM (phase error) of (a) 16 MHz OFDM and (b) 64 MHz OFDM signals.

| חפת                  | 16MHz (    | OFDM     | 64MHz OFDM |          |  |
|----------------------|------------|----------|------------|----------|--|
| DrD                  | ACPR (dBc) | EVM (dB) | ACPR (dBc) | EVM (dB) |  |
| No DPD               | -44        | -36      | -38        | -31      |  |
| <b>Real-Time LUT</b> | -47        | -40      | -39        | -31      |  |
| Offline ILC          | -60        | -60      | -53        | -47      |  |
| Real-Time Proposed   | -55        | -50      | -48        | -44      |  |

Table 6.1: Comparison of different DPD techniques in this work on linearity

#### 6.5. CONCLUSION

In this chapter, DPD techniques for a digital polar TX are presented. Considering the RF-DAC associated nonidealities, it is demonstrated that by using an ILC technique assisted by LUTs, along with circuit-level linearization techniques to maintain uniform quantization noise, and remove the SSRs of the PM signal, it is possible to achieve a -60 dBc ACPR and -60 dB EVM with a 9-bit polar DPA, which is close to its theoretical quantization noise limit. Furthermore, a novel real-time direct-learning DPD technique inspired by the ILC technique is proposed, providing a -55/-48 dBc ACPR and -50/-44 dB EVM for 16/64 MHz OFDM signals, which to the best of author's knowledge is the highest linearity reported for a wideband digital polar transmitter. Furthermore, in contrast to conventional direct-learning DPDs, the parameters of the proposed DPD are directly extracted using the LS algorithm.

#### **R**EFERENCES

- M. Hashemi, M. S. Alavi, and L. C. N. De Vreede, *Pushing the Linearity Limits of a Digital Polar Transmitter*, in 2018 13th European Microwave Integrated Circuits Conference (EuMIC) (2018) pp. 174–177.
- [2] K. L. Moore, Iterative Learning Control (Artech House, Norwood, MA, USA, 2006).
- [3] M. Mehrpoo, M. Hashemi, Y. Shen, L. C. N. de Vreede, and M. S. Alavi, *A Wideband LinearI/Q-Interleaving DDRM*, IEEE Journal of Solid-State Circuits **53**, 1361 (2018).
- [4] H. A. Mahmoud and H. Arslan, Error vector magnitude to SNR conversion for nondata-aided receivers, IEEE Transactions on Wireless Communications 8, 2694 (2009).
- [5] J. Chani-Cahuana, P. N. Landin, C. Fager, and T. Eriksson, *Iterative Learning Control for RF Power Amplifier Linearization*, IEEE Transactions on Microwave Theory and Techniques 64, 2778 (2016).
- [6] M. Schoukens, J. Hammenecker, and A. Cooman, Obtaining the Preinverse of a Power Amplifier Using Iterative Learning Control, IEEE Transactions on Microwave Theory and Techniques 65, 4266 (2017).
- [7] D. R. Morgan, Z. Ma, J. Kim, M. G. Zierdt, and J. Pastalan, A Generalized Memory Polynomial Model for Digital Predistortion of RF Power Amplifiers, IEEE Transactions on Signal Processing 54, 3852 (2006).
- [8] M. Hashemi, Y. Shen, M. Mehrpoo, M. S. Alavi, and L. C. N. de Vreede, *An Intrinsically Linear Wideband Polar Digital Power Amplifier*, IEEE Journal of Solid-State Circuits 52, 3312 (2017).

# 

### **CONCLUSION**

T HIS dissertation presents the analysis, design and implementation of novel intrinsically linear digital polar transmitters (TXs), for high bandwidth/high efficiency applications, targeting low to medium RF output power applications (up to a few Watt) in particular. To attain this goal, both single as well as dedicated Doherty DPA TX lineups are studied and optimized for their linearity, bandwidth and efficiency. In addition, system-level considerations and digital pre-distortion (DPD) techniques are provided that further enhance the already state-of-the-art (raw) linearity and EVM/ACPR. Following this investigation, a novel low complexity DPD technique is proposed that requires less DPD processing power. In this chapter, 7.1 summarizes the outcomes of this thesis and lists the achieved accomplishments. Next, Section 7.2 provides suggestions for future follow-up works.

#### **7.1.** THESIS OUTCOME

A polar modulator/transmitter is considered to be superior to Cartesian or LINC outphasing concepts in terms of its power efficiency and low design complexity. Furthermore, the polar concept is directly compatible with the Doherty technique, which is essential in handling wideband complex modulated signals with large peak-to-average power ratios in an energy-efficient manner.

With the current trend in the wireless industry gravitating towards more integrated system-on-chip (SoC) or system-in-package (SiP) solutions, on-chip digital signal processing (DSP) has become widely available. This significant change has triggered an intensified search for (new) RF functions that can directly benefit from the ultra-fast, low-power and low-voltage digital capabilities of nanometer-scale CMOS process technologies. The fact that both the baseband signal representation, as well as the RF (square-wave) clocks are present on-chip, allows the relatively straightforward implementation of a digital polar TX architecture. In such a configuration, the switch-mode PA output stage can be configured e.g. as a (digitally) segmented class-E output stage that is controlled for its output signal level by a digital amplitude code-word (ACW), while being driven by a phase-modulated RF clock. In spite of the excellent compatibility of the polar TX concept with CMOS technology, there is still much work to be done to achieve the desired linearity/spectral purity, wideband, and energy efficient performance required for 5G applications. It is these particular design aspects that form the core focus of this thesis work.

In Chapter 2, the polar TX, and more extensively, the digital polar TX architecture
have been described. Novel RFDAC-based architectures for the phase modulator and amplitude modulator are introduced to reach the desired wideband modulation. Modulation bandwidth up to 80 MHz has in fact been measured, while reaching 100 MHz seems possible in the future, according to simulations. Class-E switching operation and its sub classes have been evaluated in view of the intended output stage operation. The so-called load-insensitive class-E configuration (q=1.3) is identified as one of the most promising candidates, when targeting energy efficiency enhancement techniques based on load modulation. Furthermore, the relation between the continuous-wave (CW) efficiency and average efficiency is given. Finally, the Doherty configuration is introduced to enhance the efficiency in power back-off operation when dealing with complex modulated signals. The basic Doherty concept is further adopted to make it suitable for digital class-E Doherty operation.

In Chapter 3, behavioral modeling techniques for nonlinear systems based on the Volterra series, such as memory-polynomial (MP) and generalized-MP (GMP) models, as well as parameter estimation techniques, such the least-square (LS) algorithm, are described. In addition, it is shown how the real-signal passband nonlinearity can be translated to a complex-signal baseband nonlinearity, which establishes the foundation from which the digital pre-distortion is performed in baseband rather than in RF. Also, the equivalent baseband model of AM-AM and AM-PM conversions are analytically calculated. New basis functions are proposed that match the natural AM-AM curves of a switch-mode DPA more closely, hence reducing the order of the nonlinear kernels in a mathematical DPD significantly. Different digital pre-distortion techniques, including adaptive DPD, signal versus data DPD, direct-learning and indirect-learning DPD are described. Furthermore, the theoretical foundations of undersampling for nonlinear system identification are explained, and different undersampling techniques for the DPD model extraction are described briefly. Consequently, by using the DPD techniques described in this chapter, it is possible to implement very power-efficient and wideband adaptive DPD to increase the overall system efficiency and performance.

In Chapter 4, novel circuit linearization techniques are proposed that have been successfully patented, namely: nonlinear sizing, overdrive voltage, control, and multiphase RF clocking. These techniques also allow digitally controlled fine tuning of the ACW-AM and ACW-PM curves, while allowing compensation for variations due to PVT, operating frequency, and output load. These techniques circumvent the need for any kind of DPD for low-power/low-end applications (such as handheld mobile phones), and tremendously relax the DPD task in high-power/high-end applications (such as base stations). The nonlinearity behavior of a class-E DPA is thoroughly analyzed and closedform equations are given to predict the amplitude-code-word (ACW)-AM ACW-PM curves. To confirm the developed principles, two different linear DPA versions are designed, fabricated and measured: one with an on-chip matching network (MN), resulting in higher integration and a smaller form factor, and one with an off-chip MN. The off-chip MN is implemented using a novel compensated Marchand balun with re-entrant coupled lines based on transmission-lines (TLs), which compared to the on-chip MN results in much higher peak drain efficiency (67% versus 44%). Without using any kind of DPD, the measured EVM and ACPR for a 40 MHz OFDM signal are -40 dBc and -33 dB, respectively, which is the best-in-class for DPD-less polar switch-mode DPAs.

In Chapter 5, for the first time, an intrinsically linear wideband class-E CMOS Doherty DPA is reported. Closed-form equations are extracted to predict its ACW-AM and ACW-PM curves. System-level considerations, and especially the effect of timing mismatch between the peak and main DPA signal paths, as well as the alignment of the AM and PM signals themselves, are given. Details are also presented of the design and implementation of a novel off-chip TL-based matching/load network of Doherty PA, based on a compensated Marchand balun with re-entrant coupled lines. By using this off-chip MN, more than 50 % drain efficiency (DE) at 6 dB power backoff is achieved, with over 30 % of relative efficiency bandwidth. By using the similar circuit-level liberalization techniques as mentioned above, two separate chips with the same architecture but different DPA parameters are designed and fabricated. Measured results show a -41 dBc ACPR and -36 dB EVM for a 16 MHz OFDM signal without using DPD. By using DPD, a -52 dBc ACPR and -50 dB EVM are measured for a 16 MHz OFDM signal, which to the best of author's knowledge are the best results ever reported for a fully digital Doherty PA in terms of spectral purity.

In Chapter 6, the theoretical limiting factors of linearity (i.e. ACPR/EVM) of a DPA in a digital polar TX are further analyzed.Next, two less-studied but significant system-level factors which can also limit the ACPR/EVM performance, namely non-uniform quantization noise and spectral sampling replicas (SSRs) of the PM signal, are described. Finally, solutions to overcome these factors, such as nonlinear sizing at the circuit-level and oversampling and filtering the PM signal in system-level, are described. It is demonstrated that by combining the proposed circuit-level linearization techniques in this thesis, with system-level considerations and optimal digital pre-distortion based on the iterative learning control (ILC) technique, it is possible to reach the minimum theoretical limit of the ACPR and EVM for the realized digital TX hardware. The result is a -60 dBc ACPR and -60 dB EVM for a 16 MHz OFDM signal with a 9-bit DPA, which is very close to the theoretical quantization noise limit of the realized hardware. This is, to the best of author's knowledge, the highest linearity reported for a wideband digital polar transmitter. Furthermore, novel real-time direct-learning DPD inspired by the ILC technique is proposed, providing a -55 dBc ACPR and -50 dB EVM. This novel DPD is expected to achieve a high linearity performance similar to conventional direct-learning DPD, but with a complexity similar to that of conventional indirect-learning DPD based on MP/GMP models.

### **7.2.** SUGGESTIONS FOR FUTURE DEVELOPMENTS

Although this dissertation work yields several novel circuit-level and system-level techniques to design linear as well as energy efficient digital transmitters, there is still much room for further research and development towards an industrially compatible design in terms of system integration, linearity and output power. In the following, some suggestions are given for further research and development:

- New basis functions are introduced in Chapter 3 for modeling/correcting the ACW-AM of a single DPA. This technique can be further extended to modeling/correcting the ACW-PM and the ACW-AM/PM of a Doherty DPA by using the already derived closed-form equations in Chapter 4 and Chapter 5.
- The DPAs in this work are implemented in bulk 40 nm CMOS mainly for cost and availability reasons. However, the low breakdown voltage of such a technology automatically restricts the related hardware to lower output power levels (e.g. up to one Watt). However, by using high-voltage RF technologies such as LDMOS (Sibased) or GaN FETs, significantly higher output power can be realized (e.g. 25 W). Using the proper technology/circuit adjustment these (LDMOS/Gan) power output stage devices can be directly driven by a CMOS chip. Using the novel concepts presented in this thesis, such a high-power DPA can also be made intrinsically linear. Practical high power DTX implementations targeting higher frequencies or resolution might be still challenged by the lack of high-quality P-channel FETs in these power technologies.
- In this thesis work, the CORDIC and phase modulator are still implemented offchip. However, by implementing these blocks on-chip, a fully integrated digital polar TX can be designed which is intrinsically linear.

- The proposed circuit-level techniques within this work seem to be sufficient in terms of spectral purity (ACPR/EVM) for most handheld/low-power applications. However, for the higher power levels and more demanding wideband applications with very strict requirements on the out-of-band emissions, such as 5G/mMIMO base stations, it is likely that there is still a need for some (relaxed/low-power) DPD. With this in mind, a fully integrated digital RF solution with a CORDIC, upsampler, memory (e.g. LUTs), and FIR/IIR filters already available on-chip, the addition of extra on-chip DSP to include a real-time mathematical DPD engine is no longer a large step. Note that such a new functionality would allow significant reductions in complexity and cost in future generations of 5G mMIMO base stations, which would increase market acceptance of DTX-based products.
- To make the DTX robust to variations in voltage/temperature/signal properties/antenna load/etc., a monitoring feedback system is necessary especially for high-power application. By using the undersampling techniques described in Chapter 3, a very low-power low-cost feedback system (i.e. with on chip ADCs) can be designed to extract and update the DPD parameters on-chip. Furthermore, machine-learning techniques could be used in addition to make the DPD parameter extraction and adaption smarter and more efficient by compressive sensing and non-uniform sampling.
- As explained in Chapter 6, the spectral sampling replicas (SSRs) of the PM signal can degrade the EVM/ACPR by at least 3-4 dB. Therefore, for band-limited applications, it is better to implement the phase modulator with a passband filter prior to the limiter to filter out these SSRs as much as possible. Alternatively, the phase modulator can be implemented by a PLL/ADPLL to benefit from the suppression of the SSRs by the loop filter. However, a(n) (AD)PLL-based phase modulator typically cannot support the bandwidth of the RFDAC-based solution proposed in this thesis.

# **LIST OF ACRONYMS**

- 1D One-dimensional
- 2D Two-dimensional
- ACPR Adjacent Channel Power-Ratio
- ACW Amplitude-Code-Word
- ADC analog-to-digital converter
- ADPLL All-Digital PLL
- AM Amplitude Modulation/Amplitude-modulated
- AR Augmented Reality
- Balun balanced-unbalanced
- **BB** Baseband
- BW Bandwidth
- CML current-mode logic
- CMOS complementary metal-oxide-semiconductor
- CORDIC COordinate Rotation DIgital Computer
- DAC Digital-to-Analog Converter
- DC Direct Current
- **DDTX** Direct Digital Transmitter
- DE Drain Efficiency
- DNL differential nonlinearity

- DPA Digitally Controlled Power Amplifier
- **DPD** Digital Predistortion
- DSP digital signal processing
- **EER** Envelope-Elimination and Restoration
- EVM Error-Vector Magnitude
- FFT Fast Fourier Transform
- FIR finite impulse response
- I In-phase
- IC integrated circuit
- ILC Iterative-Learning-Control
- IoT Internet-of-Things
- IQ Cartesian
- I/Q I/Q vector
- **IQ**<sub>BB</sub> Baseband I/Q vector
- LDO Low-Drop-Out voltage regulator
- LINC Linear Amplification using Nonlinear Components
- LO Local Oscillator
- LPF Low-Pass Filter
- LS Least-Squares
- LSB Least Significant Bit
- MSB Most Significant Bit
- **OFDM** Orthogonal Frequency-Division Multiplexing
- PA Power Amplifier
- PAE Power-Added Efficiency

PAPR Peak-to-Average Power-Ratio PLL Phase-Locked Loop **PM** Phase Modulation/Phase-modulated **PVT** process, voltage, and temperature **Q** Quadrature-phase **QAM** Quadrature Amplitude Modulation **QPSK** Quadrature Phase-Shift Keying SE System Efficiency (SE) SNR signal-to-noise ratio SRAM static random access memory SSR Sampling Spectral Replica **RF** Radio Frequency RFDAC RF Digital-to-Analog Converter **RFIC-TX** radio frequency integrated circuit transmitter RMS root mean square **RRC** root raised cosine **RX** Receiver **TX** Transmitter VR Virtual Reality **ZOH** Zero-Order-Hold

# **LIST OF FIGURES**

| 1.1 | Evolution of RF power generation                                                           | 3  |
|-----|--------------------------------------------------------------------------------------------|----|
| 1.2 | Data rate trends in wireless and wireline communication systems                            | 4  |
| 1.3 | 4-QAM (left) and 16-QAM (right) constellation diagrams.                                    | 5  |
| 1.4 | Concepts of the three main TX architectures                                                | 6  |
| 1.5 | Conventional analog-intensive Cartesian TX.                                                | 9  |
| 1.6 | Conventional analog-intensive Polar TX with the resulting spectra                          | 12 |
| 1.7 | Digital-intensive Cartesian TX.                                                            | 13 |
| 1.8 | Digital-intensive polar TX                                                                 | 13 |
| 1.9 | Example of two-tone input/output signals of a nonlinear system with a                      |    |
|     | third-order nonlinearity                                                                   | 14 |
| 2.1 | (a) Current-mode IQ-RFDAC, and (b) RFDAC-based phase modulator with                        |    |
|     | harmonic rejection                                                                         | 23 |
| 2.2 | Concept of amplitude modulation by a digital PA                                            | 25 |
| 2.3 | Cartesian upsmapling vs. polar upsampling                                                  | 26 |
| 2.4 | Conceptual operation comparison of a transconductance and a switch-                        |    |
|     | mode PA                                                                                    | 27 |
| 2.5 | (a) Class-E PA circuit, (b) calculated optimum values of $K_L$ , $K_C$ , $L_P$ , $K_D$ vs. |    |
|     | $q_D$ with ZVS and ZdVS, and a 50% duty cycle from $q_D$ =0.6 to $q_D$ =1.8 [14],          |    |
|     | and (c) drain voltage and current for different designs with { $\alpha = 0, q_D =$         |    |
|     | 1.41} and $\{\alpha = 1, q_D = 1.23\}$ .                                                   | 29 |
| 2.6 | Simplified single-ended class-E DPA with the ACW-AM and ACW-PM curves.                     | 31 |
| 2.7 | (a) CW drain efficiency vs normalized output voltage compared to the av-                   |    |
|     | erage and CW drain efficiencies of ideal class-A and class-B PAs, as well as               |    |
|     | the probability distribution function (PDF) of a QAM signal, and (b) the DE                |    |
|     | correction factor $C_{DE}$ (dB) vs. PAPR (dB)                                              | 33 |
| 2.8 | (a) Conventional symmetrical current-mode Doherty PA, (b) simplified mode                  | l, |
|     | and (c) concept of voltage-mode Doherty PA                                                 | 34 |

| 2.9  | (a) Drain voltages in an ideal symmetrical current-mode Doherty PA, (b)<br>load modulation seen by the Main and Peak PAs, (c) drain currents, (d)<br>drain efficiencies assuming class-B operation, as well as the PDF of OAM |     |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
|      | signal.                                                                                                                                                                                                                       | 35  |
| 2.10 | (a) Concept of the digital Doherty PA, and (b) simplified class-E digital Doherty PA with the ACW-AM and ACW-PM curves.                                                                                                       | 36  |
| 3.1  | (a) Continuous-time model of a Volterra series, and (b) discrete-time im-                                                                                                                                                     |     |
|      | plementation of a memory polynomial model                                                                                                                                                                                     | 45  |
| 3.2  | (a) Different orders of the proposed basis functions $\psi_{i,AM-AM}$ and $\psi_{i,AM-AM}$<br>and (b) an example of modeling a highly nonlinear ACW-AM curve using                                                            | -1, |
|      | the MP model (order of 11) and the proposed model (order of 3)                                                                                                                                                                | 51  |
| 3.3  | (a) Basic concept of predistortion, (b) example of the input/output spec-<br>trum of a memoryless nonlinear PA (5 <sup>th</sup> -order), with and without DPD (9 <sup>th</sup> -                                              |     |
|      | order).                                                                                                                                                                                                                       | 54  |
| 3.4  | Conventional adaptive DPD.                                                                                                                                                                                                    | 54  |
| 3.5  | Data DPD vs. signal DPD, (b) the resulting output spectrum using different                                                                                                                                                    |     |
|      | interpolation filters.                                                                                                                                                                                                        | 55  |
| 3.6  | Polyphase LUT-DPD                                                                                                                                                                                                             | 56  |
| 3.7  | Two-stage hybrid DPD (Memory mathematical DPD with LUT) for a polar                                                                                                                                                           |     |
|      | TX: (a) block diagram, and (b) output spectra.                                                                                                                                                                                | 57  |
| 3.8  | (a) Indirect-learning DPD, and (b) direct-learning DPD.                                                                                                                                                                       | 58  |
| 4.1  | (a) Conventional polar DPA with "linear sizing" resulting in ACW-AM and ACW-PM distortion, and (b) proposed polar DPA with "poplinear sizing"                                                                                 |     |
|      | "multiphase RF clocking" and "overdrive voltage control"                                                                                                                                                                      | 71  |
| 42   | (a) Conventional single-ended 9-bit class-E DPA, and (b) time domain out-                                                                                                                                                     | • • |
| 1.2  | put waveforms showing ACW-AM and ACW-PM distortion.                                                                                                                                                                           | 71  |
| 4.3  | Simulated (a) DC curves of $L_{DS}$ vs. $V_{DS}$ for different $K_W$ with a fixed $V_{CS} = 1.1$                                                                                                                              |     |
|      | V. (b) dynamic load lines for a typical class-E DPA with $V_{DD} = 0.5$ . (c) drain                                                                                                                                           |     |
|      | voltage waveforms, and (d) drain current waveforms.                                                                                                                                                                           | 73  |
| 4.4  | (a) Push-pull class-E DPA, (b) its lumped model, (c) its odd-mode half cir-                                                                                                                                                   |     |
|      | cuit, and (d) its odd-mode half circuit LTI (Norton equivalent) model for                                                                                                                                                     |     |
|      | simplified theoretical analysis of linearity.                                                                                                                                                                                 | 74  |
| 4.5  | Simulated and calculated (a) ACW-AM conversion curves of a class-E DPA,                                                                                                                                                       |     |
|      | and (b) ACW-PM conversion curve of a class-E DPA.                                                                                                                                                                             | 75  |

| 4.6  | Simulated DE of an ideal class- $E/F_2$ , class-E and class-B (D)PA vs. normal-        |    |
|------|----------------------------------------------------------------------------------------|----|
|      | ized output voltage.                                                                   | 76 |
| 4.7  | (a) Total effective size $W_{eff}$ ( $\mu$ m) vs. ACW, (b) simulated normalized output |    |
|      | AM vs. $W_{eff}$ ( $\mu$ m), and (c) resulting simulated ACW-AM curves for a DPA       |    |
|      | with linear sizing, nonlinear sizing, and segmented nonlinear sizing.                  | 77 |
| 4.8  | Simulated (a) ACW-AM and (b) output PSD of a nonlinearly sized DPA for                 |    |
|      | different numbers of segments assuming no ACW-PM or other type of non-                 |    |
|      | ideality.                                                                              | 78 |
| 4.9  | Concept of overdrive-voltage tuning technique to control the linearity of              |    |
|      | the ACW-AM curve.                                                                      | 79 |
| 4.10 | Simulated (a) ACW-AM curves of a scenario showing how to correct for the               |    |
|      | process variation from the TT to the FF corner by controlling the overdrive            |    |
|      | voltage, and (b) the effect of temperature variation on normalized ACW-                |    |
|      | AM and ACW-PM conversion curves.                                                       | 81 |
| 4.11 | (a) Basic concept of multiphase RF clocking, and (b) the resulting simu-               |    |
|      | lated phase distortions of a DPA with conventional single-phase RF clock-              |    |
|      | ing and multiphase RF clocking.                                                        | 81 |
| 4.12 | (a) Simplified LTI model of multiphase RF clocking, phasor representation              |    |
|      | of the output signal and the currents of each segment for (b) a conven-                |    |
|      | tional DPA, (c) a DPA with multiphase RF clocking requiring positive phase             |    |
|      | offsets, and (d) a DPA with multiphase RF clocking with negative phase off-            |    |
|      | sets implementable by positive delay offsets.                                          | 82 |
| 4.13 | (a) Flowchart of delay offset optimization for ACW-PM correction, and (b)              |    |
|      | the simulated effect of multiphase RF clocking on ACW-AM conversion.                   | 84 |
| 4.14 | Capacitive harmonic tuning for efficiency enhancement (a) circuit, and (b)             |    |
|      | power and efficiency simulation vs. duty cycle.                                        | 85 |
| 4.15 | (a) Overall block diagram of the proposed DPA, (b) the circuit of sub-PA,              |    |
|      | and (c) the single-ended to differential converter.                                    | 87 |
| 4.16 | Chip micrograph (core area = $1 \text{ mm} \times 0.45 \text{ mm}$ ).                  | 88 |
| 4.17 | (a) 6-bit digitally programmable on-chip LDO designed for overdrive-voltage            |    |
|      | tuning, and (b) IQ trajectory of the effect of tuning the LDO setting on               |    |
|      | ACW-PM linearity.                                                                      | 88 |
| 4.18 | Structure of the 4-bit fine-resolution delay line and its delay-cells.                 | 88 |
| 4.19 | AM/PM timing mismatch correction by (a) coarse delay line, and (b) digital             |    |
|      | FIR filter implemented as a fractional delay.                                          | 89 |
|      |                                                                                        |    |

| 4.20 (a) Layout of the balun, and (b) final design with ground shielding and out-                              |        |
|----------------------------------------------------------------------------------------------------------------|--------|
| put pads                                                                                                       | 90     |
| 4.21 Electromagnetic (EM) simulation results of the on-chip balun.                                             | 91     |
| 4.22 EM simulation results of the loaded input impedance of the on-chip balun.                                 | 91     |
| 4.23 Chip micrograph showing the DPA designed for the off-chip MN.                                             | 92     |
| 4.24 Conceptual structure of the off-chip MN with a compensated Marchand                                       |        |
| balun as well as the connection of the DPA to the MN with the realized                                         |        |
| parallel and series resonators                                                                                 | 92     |
| 4.25 (a) Compensated Marchand balun with second harmonic termination im-                                       |        |
| plemented by a via, and (b) the measured and simulated differential-to-                                        |        |
| single-ended transmission loss                                                                                 | 93     |
| 4.26 Fabricated PCB of the off-chip Marchand balun.                                                            | 94     |
| 4.27 Measurement setup                                                                                         | 94     |
| 4.28 (a) Measured peak DE (%), PAE (%), and $P_{OUT}(dBm)$ of the DPA with the                                 |        |
| on-chip MN vs. carrier frequency for $V_{DD} = 0.5$ , 0.6, and 0.7 V; (b) mea-                                 |        |
| sured $P_{\text{OUT}}$ and $P_{\text{DC}}$ normalized to $P_{\text{DC},\text{Max}},$ DE and PAE vs. normalized |        |
| output amplitude at 2.2 GHz with $V_{DD}$ = 0.5 V, showing a linear roll-off for                               |        |
| DE similar to class-B                                                                                          | 95     |
| 4.29 (a) Measured peak DE (%), PAE (%), and $P_{OUT}(dBm)$ vs. carrier frequency,                              |        |
| and (b) ACW-AM and ACW-PM of the DPA with the off-chip MN with $V_{DD} = 0$ .                                  | 7V. 96 |
| 4.30 Measured semi-static linearity of the DPA with the on-chip MN using a tri-                                |        |
| angle signal at 2 GHz and $V_{DD}$ = 0.5V (a) ACW-AM for various LDO settings,                                 |        |
| and (b) ACW-PM after each iteration of the optimization algorithm. The                                         |        |
| numbers in the brackets show the codes of the five delay offsets                                               | 96     |
| 4.31 Measured ACW-AM and ACW-PM of the DPA with the on-chip MN under                                           |        |
| load variations: (a) before correction, and (b) after correcting the LDO and                                   |        |
| delay offset settings.                                                                                         | 97     |
| 4.32 Measured spectrum and constellation diagram of the DPA with the on-chip                                   |        |
| MN: (a) 20 MHz 64-QAM signal, and (b) 20 MHz OFDM 64-QAM signal.                                               | 98     |
| 4.33 Measured spectrum and constellation diagram of the DPA with the on-chip                                   |        |
| MN with a (a) 40 MHz OFDM 64-QAM signal, and (b) 80 MHz OFDM 64-                                               | 00     |
| QAM signal.                                                                                                    | 98     |
| 4.34 Measured spectrum and constellation diagram of the DPA with the off-chip                                  | 00     |
| MIN with a 40 MHz 64-QAM signal centered at $f_C = 2.6$ GHz with $V_{DD} = 0.7$ V.                             | 99     |

| 4.36       Measured spectrum of the DPA with the on-chip MN with (a) 40 MHz and<br>(b) 80 MHz OFDM signals under different $V_{DDS}$ .       100         4.37       Measured out-of-band spectrum of the DPA with the on-chip MN with a<br>20 MHz OFDM signal measured at $f_C = 2$ GHz.       100         5.1       Digital-intensive Polar TX with digital Doherty PA.       108         5.2       Simplified single-ended structure of a class-E Doherty PA with a TL-based<br>impedance inverter .       110         5.3       (a) Single push-pull class-E PA, and (b) angle of the impedance seen by drain.       110         5.4       (a) Conventional and compensated impedance inverter, (b) Smith chart<br>showing the input impedances at 6 dB PBO vs. normalized frequency, and<br>(d) Doherty PA with a compensated impedance inverter and the ideal class-<br>B drain efficiency curves at 6 dB PBO vs. frequency.       111         5.5       (a) Marchand balun, (b) compensated Marchand balun, and (c) compen-<br>sated Marchand balun with re-entrant coupled lines.       113         5.6       (a) Ac grounding at $\lambda_{even}/8$ for second harmonic control, using a via from<br>the floating metal layer to the ground plane, shown for a single PA, (b) EM<br>simulation results of the input impedance at $2^{nd}$ harmonic, shown for a<br>single PA.       114         5.7       (a) Simplified linearly sized single-ended class-E digital Doherty PA with<br>compensated impedance inverter, and (b) ACW and effective width of each<br>branch vs. the input ACW.       115         5.8       Simplified single-ended Norton-equi                                                                                                                                            | 4.35 | Measured spectrum of the DPA with the on-chip MN with 20 MHz OFDM signals under different (a) load conditions and (b) $V_{DD}$ s.                                                                                                                                                                                                                                    | 99  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 4.37 Measured out-of-band spectrum of the DPA with the on-chip MN with a<br>20 MHz OFDM signal measured at $f_C = 2$ GHz.       100         5.1 Digital-intensive Polar TX with digital Doherty PA.       108         5.2 Simplified single-ended structure of a class-E Doherty PA with a TL-based<br>impedance inverter       110         5.3 (a) Single push-pull class-E PA, and (b) angle of the impedance seen by drain.       110         5.4 (a) Conventional and compensated impedance inverter, (b) Smith chart<br>showing the input impedances at 6 dB PBO, (c) normalized magnitude and<br>angle of the input impedance at 6 dB PBO vs. normalized frequency, and<br>(d) Doherty PA with a compensated impedance inverter and the ideal class-<br>B drain efficiency curves at 6 dB PBO vs. frequency.       111         5.5 (a) Marchand balun, (b) compensated Marchand balun, and (c) compen-<br>sated Marchand balun, with re-entrant coupled lines.       113         5.6 (a) Ac grounding at $\lambda_{even}/8$ for second harmonic control, using a via from<br>the floating metal layer to the ground plane, shown for a single PA, (b) EM<br>simulation results of the input impedance at 2 <sup>nd</sup> harmonic, shown for a<br>single PA.       114         5.7 (a) Simplified linearly sized single-ended class-E digital Doherty PA with<br>compensated impedance inverter, and (b) ACW and effective width of each<br>branch vs. the input ACW.       115         5.8 Simplified single-ended Norton-equivalent LTI model of (a) the digital Do-<br>herty PA, (b) main DPA for $ACW < ACW_{MR,Max}$ , (c) main DPA for $ACW > ACW_{MR,Max}$ , and (d) and the peak DPA for $ACW > ACW_{MR,Max}$ .       118         5.9 Calculated and simula                                       | 4.36 | Measured spectrum of the DPA with the on-chip MN with (a) 40 MHz and (b) 80 MHz OFDM signals under different $V_{DD}$ s                                                                                                                                                                                                                                              | 100 |
| 5.1Digital-intensive Polar TX with digital Doherty PA.1085.2Simplified single-ended structure of a class-E Doherty PA with a TL-based<br>impedance inverter .1105.3(a) Single push-pull class-E PA, and (b) angle of the impedance seen by drain.1105.4(a) Conventional and compensated impedance inverter, (b) Smith chart<br>showing the input impedances at 6 dB PBO, (c) normalized magnitude and<br>angle of the input impedance at 6 dB PBO vs. normalized frequency, and<br>(d) Doherty PA with a compensated impedance inverter and the ideal class-<br>B drain efficiency curves at 6 dB PBO vs. frequency.1115.5(a) Marchand balun, (b) compensated Marchand balun, and (c) compen-<br>sated Marchand balun, (b) compensated Marchand balun, and (c) compen-<br>sated Marchand balun with re-entrant coupled lines.1135.6(a) Ac grounding at $\lambda_{even}/8$ for second harmonic control, using a via from<br>the floating metal layer to the ground plane, shown for a single PA, (b) EM<br>simulation results of the input impedance at $2^{nd}$ harmonic, shown for a<br>single PA.1145.7(a) Simplified linearly sized single-ended class-E digital Doherty PA with<br>compensated impedance inverter, and (b) ACW and effective width of each<br>branch vs. the input ACW.1155.8Simplified single-ended Norton-equivalent LTI model of (a) the digital Do-<br>herty PA, (b) main DPA for $ACW < ACW_{MRMax}$ , (c) main DPA for $ACW > ACW_{MRMax}$ , and (d) and the peak DPA for $ACW > ACW_{MRMax}$ .1185.9Calculated and simulated ACW-AM and ACW-PM curves.1185.10(a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal DPD,<br>along with the inverse of the ACW-AM curve, and (b) the effect of a 10-bit<br>nonunifor                                                      | 4.37 | Measured out-of-band spectrum of the DPA with the on-chip MN with a 20 MHz OFDM signal measured at $f_C = 2$ GHz                                                                                                                                                                                                                                                     | 100 |
| 5.2Simplified single-ended structure of a class-E Doherty PA with a TL-based<br>impedance inverter1105.3(a) Single push-pull class-E PA, and (b) angle of the impedance seen by drain. 1105.4(a) Conventional and compensated impedance inverter, (b) Smith chart<br>showing the input impedances at 6 dB PBO, (c) normalized magnitude and<br>angle of the input impedance at 6 dB PBO vs. normalized frequency, and<br>(d) Doherty PA with a compensated impedance inverter and the ideal class-<br>B drain efficiency curves at 6 dB PBO vs. frequency.1115.5(a) Marchand balun, (b) compensated Marchand balun, and (c) compen-<br>sated Marchand balun, (b) compensated Marchand balun, and (c) compen-<br>sated Marchand balun with re-entrant coupled lines.1135.6(a) Ac grounding at $\lambda_{even}/8$ for second harmonic control, using a via from<br>the floating metal layer to the ground plane, shown for a single PA, (b) EM<br>simulation results of the input impedance at 2 <sup>nd</sup> harmonic, shown for a<br>single PA.1145.7(a) Simplified linearly sized single-ended class-E digital Doherty PA with<br>compensated impedance inverter, and (b) ACW and effective width of each<br>branch vs. the input ACW.1155.8Simplified single-ended Norton-equivalent LTI model of (a) the digital Do-<br>herty PA, (b) main DPA for $ACW < ACW_{MR,Max}$ , (c) main DPA for $ACW >$<br>$ACW_{MR,Max}$ , and (d) and the peak DPA for $ACW > ACW_{MR,Max}$ .1185.9Calculated and simulated ACW-AM and ACW-PM curves.1185.10(a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal<br>10-bit quantizer on the output spectrum, compared with an ideal<br>10-bit quantizer on the output spectrum, compared with an ideal<br>10-bit quantizer on the output spectrum, compared with | 5.1  | Digital-intensive Polar TX with digital Doherty PA                                                                                                                                                                                                                                                                                                                   | 108 |
| 5.3(a) Single push-pull class-E PA, and (b) angle of the impedance seen by drain. 1105.4(a) Conventional and compensated impedance inverter, (b) Smith chart<br>showing the input impedances at 6 dB PBO, (c) normalized magnitude and<br>angle of the input impedance at 6 dB PBO vs. normalized frequency, and<br>(d) Doherty PA with a compensated impedance inverter and the ideal class-<br>B drain efficiency curves at 6 dB PBO vs. frequency.1115.5(a) Marchand balun, (b) compensated Marchand balun, and (c) compen-<br>sated Marchand balun with re-entrant coupled lines.1135.6(a) Ac grounding at $\lambda_{even}/8$ for second harmonic control, using a via from<br>the floating metal layer to the ground plane, shown for a single PA.1145.7(a) Simplified linearly sized single-ended class-E digital Doherty PA with<br>compensated impedance inverter, and (b) ACW and effective width of each<br>branch vs. the input ACW.1155.8Simplified single-ended Norton-equivalent LTI model of (a) the digital Do-<br>herty PA, (b) main DPA for $ACW < ACW_{MR,Max}$ , (c) main DPA for $ACW >$<br>$ACW_{MR,Max}$ , and (d) and the peak DPA for $ACW > ACW_{MR,Max}$ .1185.10(a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal DPD,<br>along with the inverse of the ACW-AM curve, and (b) the effect of a 10-bit<br>nonuniform quantizer on the output spectrum, compared with an ideal<br>10-bit quantizer and 13-bit nonuniform quantizer.119                                                                                                                                                                                                                                                                                                                                | 5.2  | Simplified single-ended structure of a class-E Doherty PA with a TL-based impedance inverter                                                                                                                                                                                                                                                                         | 110 |
| 5.4(a) Conventional and compensated impedance inverter, (b) Smith chart<br>showing the input impedances at 6 dB PBO, (c) normalized magnitude and<br>angle of the input impedance at 6 dB PBO vs. normalized frequency, and<br>(d) Doherty PA with a compensated impedance inverter and the ideal class-<br>B drain efficiency curves at 6 dB PBO vs. frequency.1115.5(a) Marchand balun, (b) compensated Marchand balun, and (c) compen-<br>sated Marchand balun with re-entrant coupled lines.1135.6(a) Ac grounding at $\lambda_{even}/8$ for second harmonic control, using a via from<br>the floating metal layer to the ground plane, shown for a single PA, (b) EM<br>simulation results of the input impedance at $2^{nd}$ harmonic, shown for a<br>single PA.1145.7(a) Simplified linearly sized single-ended class-E digital Doherty PA with<br>compensated impedance inverter, and (b) ACW and effective width of each<br>branch vs. the input ACW.1155.8Simplified single-ended Norton-equivalent LTI model of (a) the digital Do-<br>herty PA, (b) main DPA for $ACW < ACW_{MRMax}$ , (c) main DPA for $ACW >$<br>$ACW_{MRMax}$ , and (d) and the peak DPA for $ACW > ACW_{MRMax}$ .1185.9Calculated and simulated ACW-AM and ACW-PM curves.1185.10(a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal DPD,<br>along with the inverse of the ACW-AM curve, and (b) the effect of a 10-bit<br>nonuniform quantizer on the output spectrum, compared with an ideal<br>10-bit quantizer and 13-bit nonuniform quantizer.119                                                                                                                                                                                                                                                            | 5.3  | (a) Single push-pull class-E PA, and (b) angle of the impedance seen by drain. I                                                                                                                                                                                                                                                                                     | 110 |
| 5.5(a) Marchand balun, (b) compensated Marchand balun, and (c) compensated Marchand balun with re-entrant coupled lines.1135.6(a) Ac grounding at $\lambda_{even}/8$ for second harmonic control, using a via from the floating metal layer to the ground plane, shown for a single PA, (b) EM simulation results of the input impedance at 2 <sup>nd</sup> harmonic, shown for a single PA.1145.7(a) Simplified linearly sized single-ended class-E digital Doherty PA with compensated impedance inverter, and (b) ACW and effective width of each branch vs. the input ACW.1155.8Simplified single-ended Norton-equivalent LTI model of (a) the digital Doherty PA, (b) main DPA for $ACW < ACW_{MP,Max}$ , (c) main DPA for $ACW > ACW_{MP,Max}$ .1185.9Calculated and simulated ACW-AM and ACW-PM curves.1185.10(a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal DPD, along with the inverse of the ACW-AM curve, and (b) the effect of a 10-bit nonuniform quantizer on the output spectrum, compared with an ideal 10-bit quantizer and 13-bit nonuniform quantizer.119                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 5.4  | <ul> <li>(a) Conventional and compensated impedance inverter, (b) Smith chart showing the input impedances at 6 dB PBO, (c) normalized magnitude and angle of the input impedance at 6 dB PBO vs. normalized frequency, and (d) Doherty PA with a compensated impedance inverter and the ideal class-B drain efficiency curves at 6 dB PBO vs. frequency.</li> </ul> | 111 |
| 5.6(a) Ac grounding at $\lambda_{even}/8$ for second harmonic control, using a via from<br>the floating metal layer to the ground plane, shown for a single PA, (b) EM<br>simulation results of the input impedance at 2 <sup>nd</sup> harmonic, shown for a<br>single PA.1145.7(a) Simplified linearly sized single-ended class-E digital Doherty PA with<br>compensated impedance inverter, and (b) ACW and effective width of each<br>branch vs. the input ACW.1155.8Simplified single-ended Norton-equivalent LTI model of (a) the digital Do-<br>herty PA, (b) main DPA for $ACW < ACW_{MP,Max}$ , (c) main DPA for $ACW >$<br>$ACW_{MP,Max}$ , and (d) and the peak DPA for $ACW > ACW_{MP,Max}$ .1185.9Calculated and simulated ACW-AM and ACW-PM curves.1185.10(a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal DPD,<br>along with the inverse of the ACW-AM curve, and (b) the effect of a 10-bit<br>nonuniform quantizer on the output spectrum, compared with an ideal<br>10-bit quantizer and 13-bit nonuniform quantizer.119                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 5.5  | (a) Marchand balun, (b) compensated Marchand balun, and (c) compen-<br>sated Marchand balun with re-entrant coupled lines                                                                                                                                                                                                                                            | 113 |
| <ul> <li>5.7 (a) Simplified linearly sized single-ended class-E digital Doherty PA with compensated impedance inverter, and (b) ACW and effective width of each branch vs. the input ACW.</li> <li>5.8 Simplified single-ended Norton-equivalent LTI model of (a) the digital Doherty PA, (b) main DPA for <i>ACW &lt; ACW<sub>MBMax</sub></i>, (c) main DPA for <i>ACW &gt; ACW<sub>MBMax</sub></i>, and (d) and the peak DPA for <i>ACW &gt; ACW<sub>MBMax</sub></i>.</li> <li>5.9 Calculated and simulated ACW-AM and ACW-PM curves.</li> <li>5.10 (a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal DPD, along with the inverse of the ACW-AM curve, and (b) the effect of a 10-bit nonuniform quantizer on the output spectrum, compared with an ideal 10-bit quantizer and 13-bit nonuniform quantizer.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 5.6  | (a) Ac grounding at $\lambda_{even}/8$ for second harmonic control, using a via from the floating metal layer to the ground plane, shown for a single PA, (b) EM simulation results of the input impedance at 2 <sup>nd</sup> harmonic, shown for a single PA.                                                                                                       | 114 |
| <ul> <li>5.8 Simplified single-ended Norton-equivalent LTI model of (a) the digital Doherty PA, (b) main DPA for ACW &lt; ACW<sub>MP,Max</sub>, (c) main DPA for ACW &gt; ACW<sub>MP,Max</sub>, and (d) and the peak DPA for ACW &gt; ACW<sub>MP,Max</sub>.</li> <li>5.9 Calculated and simulated ACW-AM and ACW-PM curves.</li> <li>5.10 (a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal DPD, along with the inverse of the ACW-AM curve, and (b) the effect of a 10-bit nonuniform quantizer on the output spectrum, compared with an ideal 10-bit quantizer and 13-bit nonuniform quantizer.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 5.7  | (a) Simplified linearly sized single-ended class-E digital Doherty PA with compensated impedance inverter, and (b) ACW and effective width of each branch vs. the input ACW.                                                                                                                                                                                         | 115 |
| <ul> <li>5.9 Calculated and simulated ACW-AM and ACW-PM curves.</li> <li>5.10 (a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal DPD, along with the inverse of the ACW-AM curve, and (b) the effect of a 10-bit nonuniform quantizer on the output spectrum, compared with an ideal 10-bit guantizer and 13-bit nonuniform guantizer.</li> <li>119</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 5.8  | Simplified single-ended Norton-equivalent LTI model of (a) the digital Doherty PA, (b) main DPA for $ACW < ACW_{MP,Max}$ , (c) main DPA for $ACW > ACW_{MP,Max}$ , and (d) and the peak DPA for $ACW > ACW_{MP,Max}$                                                                                                                                                 | 118 |
| <ul> <li>5.10 (a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal DPD, along with the inverse of the ACW-AM curve, and (b) the effect of a 10-bit nonuniform quantizer on the output spectrum, compared with an ideal 10-bit quantizer and 13-bit nonuniform quantizer.</li> <li>119</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 5.9  | Calculated and simulated ACW-AM and ACW-PM curves 1                                                                                                                                                                                                                                                                                                                  | 118 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 5.10 | <ul> <li>(a) ACW-AM curve of a 10-bit Doherty DPA with and without an ideal DPD, along with the inverse of the ACW-AM curve, and (b) the effect of a 10-bit nonuniform quantizer on the output spectrum, compared with an ideal 10-bit quantizer and 13-bit nonuniform quantizer.</li> </ul>                                                                         | 119 |

| 5.11 | (a) EVM and ACPR of a 64-QAM signal vs. AM-PM and main-peak tim-<br>ing mismatch normalized to 1/BW, and (b) block diagram of AM-PM and<br>main-peak timing mismatch correction in a digital polar TX with Doherty         |     |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
|      | DPA                                                                                                                                                                                                                        | 120 |
| 5.12 | (a) Concept of nonlinear sizing to its full extent in a digital Doherty PA, and (b) the total width of the main and peak DPA vs. ACW along with the resulting linear ACW-AM curve.                                         | 122 |
| 5.13 | (a) Concept of segmented nonlinear sizing shown for a digital PA, and (b) the resulting ACW-AM curve for a Doherty DPA with 8 segments in each main/peak DPA.                                                              | 123 |
| 5.14 | Concept of multiphase RF clocking for ACW-PM correction shown for a nonlinearly sized Doherty DPA.                                                                                                                         | 124 |
| 5.15 | (a) Overall structure of main/peak DPA, (b) the conceptual layout of the nonlinearly sized array where only the output MSB transistors are shown, and (c) the realized sizing of the MSB sub-PA cells of the main and peak |     |
|      | DPA compared to a conventional uniform array with the same total size.                                                                                                                                                     | 126 |
| 5.16 | (a) 6-bit programmable LDO for overdrive voltage tuning and the sub-PA unit cell circuit, and (b) 6-bit programmable duty-cycle correction (DCC) circuit.                                                                  | 127 |
| 5.17 | (a) Impulse response (coefficients), and (b) the frequency response of the FIR-based digital fractional delay for a delay of 200 ps with a sampling rate of 500 MHz.                                                       | 128 |
| 5 10 | Dis misses much of the main and much DDAs                                                                                                                                                                                  | 100 |
| 5.18 | (a) Conceptual structure of the proposed Doherty matching network, (b) connection of the DPA to the matching network, and (c) final realization of                                                                         | 120 |
|      | the proposed Doherty matching network.                                                                                                                                                                                     | 129 |
| 5.20 | (a) PCB photograph, (b) the structure of wire-bonding the DPA chips to the matching network, and (c) the side view of the dc feed/decoupling and RF                                                                        |     |
|      | path connections                                                                                                                                                                                                           | 131 |
| 5.21 | Measurement setups used for (a) static, and (b) dynamic measurements.                                                                                                                                                      | 132 |
| 5.22 | Measured (a) $P_{OUT}$ at full power and power backoff for different $VDD_{Main}$ ,<br>(b) drain efficiency with $VDD_{Main} = 0.6 V$ vs. center frequency, (c) drain ef-                                                  |     |
|      | ficiency, and (d) power-added efficiency with $\ensuremath{\mathrm{VDD}}_{Main}{=}0.6\ensuremath{\mathrm{V}}\ensuremath{\mathrm{vs.}}$ output                                                                              |     |
|      | power                                                                                                                                                                                                                      | 133 |

| 5.23 | Measured static (a) ACW-AM, and (b) ACW-PM at $F_C = 2.5$ GHz without us-                            |     |
|------|------------------------------------------------------------------------------------------------------|-----|
|      | ing DPD, compared to a simulated digital Doherty PA using conventional                               |     |
|      | uniformly sized output stages.                                                                       | 134 |
| 5.24 | Measured results of a 16 MHz OFDM signal at $F_C = 2.5$ GHz: output spec-                            |     |
|      | trum (a) without DPD, and (b) with ILC DPD; (c) ACW-AM with and without                              |     |
|      | ILC DPD, and (d) ACW-PM with and without ILC DPD . $\ldots$                                          | 135 |
| 5.25 | Measured spectrum of a 32 MHz OFDM signal at $F_C = 2.5$ GHz with ILC DPD.                           | 136 |
| 6.1  | (a) Conventional digital Polar TX with a switch-mode DPA showing ACW-                                |     |
|      | AM and ACW-PM nonlinearities, and (b) a linearized DPA using nonlinear                               |     |
|      | sizing, overdrive-voltage control, and multiphase RF clocking.                                       | 145 |
| 6.2  | (a) Residual sampling spectral replicas (SSRs) of AM and PM signals, (b)                             |     |
|      | simulated output spectrum of a digital polar TX with $F_C = 2 \text{ GHz}$ , $BW_0 = 128 \text{ MI}$ | Hz, |
|      | and a square wave carrier signal before and (c) after filtering the SSRs of the                      |     |
|      | PM signal.                                                                                           | 147 |
| 6.3  | (a) Exaggerated ACW-AM curves of a nonlinear DPA, a linearized nonlinear                             |     |
|      | DPA w/ DPD as a nonuniform quantizer, and an intrinsically linear DPA                                |     |
|      | as a uniform quantizer, (b) resultant simulated out- put spectra, and (c)                            |     |
|      | normalized quntization error (LSB) of a nonlinear DPA with DPD and the                               |     |
|      | proposed intrinsically linear DPA with DPD compared to an ideal linear DPA.                          | 148 |
| 6.4  | Simulated ACPR and EVM as well as the calculated QN for a 16MHz OFDM                                 |     |
|      | signal up-sampled to $F_S = 500 \text{ MHz}$ with $PAPR = 9.2 \text{ dB}$                            | 149 |
| 6.5  | (a) Concept of the ILC technique with an almost linear DPA, and (b) block                            |     |
|      | diagram of the ILC technique assisted by LUTs for fast convergence.                                  | 149 |
| 6.6  | Block diagram of the proposed DPD: (a) general, and (b) simplified form.                             | 151 |
| 6.7  | Conceptual block diagrams of (a) conventional direct-learning DPD and                                |     |
|      | (b) conventional indirect-learning DPD compared to (c) the proposed direct-                          |     |
|      | learning DPD                                                                                         | 152 |
| 6.8  | Measurement Setup.                                                                                   | 153 |
| 6.9  | Measured output spectrum of (a) 16 MHz OFDM signal with LUT-DPD and                                  |     |
|      | offline ILC-DPD after two iterations, and (b) 64 MHz OFDM signal with the                            |     |
|      | proposed real-time DPD                                                                               | 153 |
| 6.10 | Measured output spectrum of (a) 16 MHz OFDM and (b) 64 MHz OFDM                                      |     |
|      | signals.                                                                                             | 154 |

| 6.11 | Measured AM-AM (gain error) of (a) 16 MHz OFDM and (b) 64 MHz OFDM                                      |     |
|------|---------------------------------------------------------------------------------------------------------|-----|
|      | signals                                                                                                 | 154 |
| 6.12 | Measured AM-PM (phase error) of (a) $16\mathrm{MHz}\mathrm{OFDM}$ and (b) $64\mathrm{MHz}\mathrm{OFDM}$ |     |
|      | signals                                                                                                 | 155 |

# **LIST OF TABLES**

| 3.1 | Eight different types of DPD.                                       | 53  |
|-----|---------------------------------------------------------------------|-----|
| 4.1 | Measured and simulated power breakdown of the DPA with the on-chip  |     |
|     | MN at $f_C = 2$ GHz.                                                | 101 |
| 4.2 | Performance summary and comparison with the prior art               | 102 |
| 5.1 | Performance Summary and Comparison Table with Prior Art Digital Do- |     |
|     | herty PAs                                                           | 138 |
| 6.1 | Comparison of different DPD techniques in this work on linearity    | 155 |

## ACKNOWLEDGEMENTS

*This chapter of life has come to an end, but the story is yet to continue. Saadi of Shiraz, 1210-1291* 

Throughout the working on my PhD project I have received a great deal of support and assistance from so many people. I would first like to express my deepest gratitude to my advisor and promotor, Prof. Leo de Vreede for his supervision, encouragement, and support. I would also like to extend my deepest appreciation to my friend Dr. Morteza Alavi for his support, mentorship, and the many fruitful discussions. My appreciation extends to my committee members: Prof. Baltus, Prof. Nauta, Prof. Wambacq, Prof. Staszewski, Dr. Pires, and Prof. Llombart for their comments, discussions and time. I am also grateful to other professors in the Microelectronic department of TU Delft: Dr. Babaie, Dr. Spirito, Dr. van Leuken, Prof. Makinwa, Prof. Leus, Dr. Vermolen, Dr. Pertijs, and last but not least, the late Prof. Earl McCune.

I would like to thank Marion de Vlierger, the very kind and supportive secretary of the ELCA group, as well as Atef Akhnoukh, Marco Pelk, Wil Straver, Ali Kaichouhi, and Zou Yao for the technical support during the tape-outs and the measurements. I'd like to acknowledge the Dutch Research Council (NWO) and express my gratitude to Nick Pulsford, Mustafa Acar, Fred van Rijs, and John Gajadharsing from Ampleon for supporting my research.

I must send my gratitude to my friends and colleagues on the 18<sup>th</sup> floor. My officemates, Mahdi Salarpour, Ying Wu, Jordi van der Meulen, and Huizhen Qian. My teammates, Yiyu Shen, Lei Zhou, and Michael Polushkin. Further, Nawaf Almotairi, Rob Bootsman, Dieuwert Mul, Xun Lou, Zhirui Zong, Gerasimos Vlachogiannakis, Satoushi Malotaux, Carmine De Martino, Luca Galatro, Zhebin Hu, Yue Chen, Augusto Ximenes, Amir Reza Ahmadi Mehr, Arash Noroozi, Ronaldo Martins da Ponte, Alessandro Urso, Kambiz Nanbakhsh, Gustavo Martins, and my colleagues at ItoM deserve the mention.

My appreciation is forwarded to my friends from the Iranian community in the Netherlands: Morteza and Haleh, Masoud and Mina, Bahman and Joelle, Saleh and Samira, Milad and Bahar, Aydin and Bahareh, Saeed and Hengameh, Milad and Maryam, Ali and Bita, Hanieh, Mohsen and Nasim, Mohammadreza and Boshra, Masoud and Mahsa, Mohammad Ali, Mohi and Somi, Pooyan, Armin, and Somayyeh. My appreciation also to my friends from Iran: Ehsan, Mohsen, Mohsen (yes, another Mohsen), Milad, and Mahyar, for the wonderful friendship that we've had together since more than 18 years ago.

My deepest gratitude to my lovely family. To my father, Ahamd; my mother, Zohreh; and my sisters, Elaheh, Elham, Shima, and Mahsa, who have always been there for me with their complete and unconditional support and love. Last but not least, I would like to express my greatest gratitude to my beloved wife, Negin. I would not be here if it was not for your sacrifices. I cannot thank you enough for that. You are my best friend. I also would like to thank your family for their kindness and thoughtfulness.

> Mohsen Hashemi Eindhoven, December 2020

# **CURRICULUM VITÆ**

### Mohsen Наѕнемі

1984 Born in Eilam (Ilam), Iran.

### **EDUCATION**

| 2002-2006 | B.Sc in Elect                                      | rical Engineering (Electronics)                         |
|-----------|----------------------------------------------------|---------------------------------------------------------|
|           | Shahid Behe                                        | shti University, Tehran, Iran                           |
| 2008–2011 | M.Sc. in Electrical Engineering (Microelectronics) |                                                         |
|           | Sharif Unive                                       | rsity of Technology, Tehran, Iran                       |
| 2014-2020 | PhD. in Micr                                       | oelectronics                                            |
|           | Delft Univers                                      | sity of Technology, Netherlands                         |
|           | Thesis:                                            | Energy Efficient and Intrinsically Linear Digital Polar |
|           |                                                    | Transmitters                                            |
|           | Promotor:                                          | Prof. dr. ing. Leo. C. N. de Vreede                     |

### WORK

| 2018-present | Senior RF/Analog IC Design Engineer, ItoM B.V., Eindhoven, Netherlands |
|--------------|------------------------------------------------------------------------|
| 2012–2013    | Director of RF R&D, Baregheh Company, Tehran, Iran                     |
| 2007-2010    | Electrical Engineer, Ilam Petrochemical Company, Ilam, Iran            |

## **LIST OF PUBLICATIONS**

#### **CONFERENCES**

- 1. **M.Hashemi**, M. S. Alavi and L. C. N. De Vreede, *Pushing the Linearity Limits of a Digital Polar Transmitter*, 2018 13th European Microwave Integrated Circuits Conference (EuMIC), Madrid, 2018, pp. 174-177..
- M. Hashemi, L. Zhou, Y. Shen, M. Mehrpoo and L. de Vreede, *Highly efficient and linear class-E CMOS digital power amplifier using a compensated Marchand balun and circuit-level linearization achieving* 67% *peak DE and -40 dBc ACLR without DPD*, 2017 IEEE MTT-S International Microwave Symposium (IMS), Honololu, HI, 2017, pp. 2025-2028.
- M. Hashemi Y. Shen, M. Mehrpoo, M. Acar, R. van Leuken, M. S. Alavi, and L. de Vreede, 17.5 An intrinsically linear wideband digital polar PA featuring AM-AM and AM-PM corrections through nonlinear sizing, overdrive-voltage control, and multiphase RF clocking, 2017 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2017, pp. 300-301.
- 4. M.R. Beikmirza, Y. Shen, M. Mehrpoo, M. Hashemi, D. Mul, L. de Vreede, and M. S. Alavi, A 4-Way Doherty Digital Transmitter Featuring 50%-LO Signed IQ Interleave Upconversion with more than 27 dBm Peak Power and 40% Drain Efficiency at 10 dB Power Back-Off Operating in the 5 GHz Band, Accepted for publication in the 2021 IEEE International Solid-State Circuits Conference (ISSCC)
- Y. Shen, M. Mehrpoo, M. Hashemi, M. Polushkin, L. Zhou, M. Acar, R. van Leuken, M. S. Alavi, and L. de Vreede, *A wideband I/Q RFD AC-based phase modulator*, 2018 IEEE 18th Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF), Anaheim, CA, 2018, pp. 8-11.
- M. Mehrpoo, M. Hashemi, Y. Shen, R. van Leuken, M. S. Alavi and L. C. N. de Vreede, A wideband linear direct digital RF modulator using harmonic rejection and I/Q-interleaving RF DACs, 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Honolulu, HI, 2017, pp. 188-191.
- Y. Shen, M. Polushkin, M. Mehrpoo, M. Hashemi, E. McCune, M. S. Alavi, and L. de Vreede, *A fully-integrated digital-intensive polar Doherty transmitter*, 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Honolulu, HI, 2017, pp. 196-199.

#### **JOURNALS**

- 1. **M.Hashemi**, L. Zhou, Y. Shen and L. C. N. de Vreede, *A Highly Linear Wideband Polar Class-E CMOS Digital Doherty Power Amplifier*, IEEE Transactions on Microwave Theory and Techniques, vol. 67, no. 10, pp. 4232-4245, Oct. 2019..
- M. Hashemi, Y. Shen, M. Mehrpoo, M. S. Alavi and L. C. N. de Vreede, *An Intrinsically Linear Wideband Polar Digital Power Amplifier*, IEEE Journal of Solid-State Circuits, vol. 52, no. 12, pp. 3312-3328, Dec. 2017.
- M. Mehrpoo, M. Hashemi, Y. Shen, L. C. N. de Vreede and M. S. Alavi, *A Wideband Linear I/Q-Interleaving DDRM*, IEEE Journal of Solid-State Circuits, vol. 53, no. 5, pp. 1361-1373, May 2018.

#### PATENTS

- M. Hashemi, L. C. N. de Vreede, An Intrinsically Linear Wideband Digital Polar PA featuring AM-AM and AM-PM Corrections through Nonlinear Sizing, Overdrive Voltage Control, and Multi-Phase RF Clocking, Patent No. US 10659091 B2, May 2020
- L. C. N. de Vreede, S. M.Alavi, M. Mehrpoo, M. E. Polushkin, M. Hashemi, Y. Shen, *RF-DAC Based Phase Modulator*, Patent No. US US 10644656 B2, May 2020