VHDL Implementation of 4096-bit RNS Montgomery Modular Exponentiation for RSA Encryption

More Info
expand_more

Abstract

Modular exponentiation is the basis needed to perform RSA encryption and decryption. Execution of 4096-bit modular exponentiation using an embedded system requires many arithmetic operations. This work aims to improve the performance of modular exponentiation for an existing FPGA platform containing a soft core RISC-V processor. The solution is to introduce a peripheral that performs Montgomery multiplication with 4096-bit operands. These operands are represented using the Residue Number System (RNS) and each residue is assigned to a RNS processor core. In total, the system consists of 121 RNS cores and each core is responsible for a 34-bit residue. RNS Montgomery multiplication requires a base extension algorithm, which will be represented by the Bajard and the Shenoy base extensions. Two designs are proposed: one using a tree-based reduction circuit and one with an iterative reduction circuit. The former focuses on performance overall, while the latter is more efficient in terms of performance per area. Synthesis for a XCKU035 FPGA gives an area usage of 83484 LUTs for the tree-based design and 41857 LUTs for the iterative design. Both designs also require 132.5 BRAMs and 485 DSP blocks and are running at a clock frequency of 400 MHz. The tree-based design performs Montgomery multiplication using 4096-bit operands in 434 cycles (1.09 µs), while the iterative design does that in 577 cycles (1.44 µs). For performing modular exponentiation, the sliding window method is used with an optimal window size of seven. A modular exponentiation with 4096-bit operands takes on average 5.09 ms for the tree-based design, while the iterative version needs 6.78 ms. Few other implementations were found performing Montgomery multiplication using 4096-bit operands. However, compared to those the proposed design provides better performance.

Files