# Cost Effective Modular Adders for RNS-based Processors

Cost Effective Modular Adders for RNS-based Processors

Author Contributor Faculty Department Programme Date2010-08-27

AbstractRNS can distribute the computation on long operands over small word-width RNS functional units able to operate in parallel. This property is the ground to develop fast arithmetic units. While RNS can boost addition and multiplication performance, other arithmetic operations like division, magnitude comparison, and sign detection are more difficult, when compared with their counterparts in the conventional binary number system. In view of that RNS is mostly utilized for special-purpose applications, e.g., digital filters, which are addition and multiplication dominated. For such applications the RNS capability to represent large numbers and the carry-free nature of arithmetic operations are of interest and can potentially enable fast and low-power arithmetic computation. The overall performance of any RNS based processor is mostly determined by the selected moduli set and the way the modular operations, i.e., addition and multiplication, are implemented in hardware (note that this two issues are intertwined). In this thesis we concentrate on the design of fast and energy effective modular adders able to compute |A + B|_m = (A + B if A + B < m, if otherwise A + B - m) as they are the fundamental building block for any RNS processor. We base our solution on a state of the art approach, i.e., ELM Modular Addition (ELMMA), which utilize anticipated computation in conjunction with fast parallel prefix addition. Our method follows the same anticipation principle but reduces the overall complexity by proposing an alternative design for the adders, which can now directly handle three inputs instead of two. In this way the initial carry-save addition required for ELMMA for the evaluation of the A+B-m is not longer required and this may potentially result in faster and more area and power effective designs. To evaluate the impact of our proposal we considered a number of moduli of practical interest as follows: 2^n - (2^{n-2} + 1), 2^n - 2^{n-2}, and 2^n - (2^{n-3}+1). For the considered moduli we implemented in VHDL two sets of implementations, i.e., one for the state of the art ELMMA and one for our proposal, for the n=16 case. We simulated, debug, and synthesized the designs using Cadence Encounter RTL Compiler for ASIC Designs for 90 nm CMOS technology. Our results indicate that for moduli 2^n - (2^{n-2} + 1), 2^n-2^{n-2}, and 2^n-(2^{n-3}+1), our proposal requires 13%, 32%, and 28% smaller area, is 14%, 3%, and 9% faster, and is 15%, 20%, and 13% more power efficient, respectively, when compared with the state of the art.

Subject To reference this document use: Part of collectionStudent theses

Document typemaster thesis

Rights(c) 2010 Sukma, O.D.I.