Carlos Salazar-García
Please Note
5 records found
1
-The development of multi-FPGA systems focused on high-performance computing requires high-speed channels, low bandwidth overhead and latency. In this paper, we propose a multi-FPGA interconnection framework aimed at distributed processing applications. Our solution allows efficient communication between different processing elements distributed among the FPGAs. To evaluate our proposal, we built a multi-FPGA system composed of five Zynq ZC706 FPGA boards capable of hosting a diverse number of coprocessors distributed over our custom network. With an aggregate bandwidth of up to 25Gbps per FPGA board, the interconnection framework reaches a latency of only 200.36ns, one of the lowest reported in the lElectronics Engineering, iterature. Experimental results show a computational efficiency of 97.25 % with a sustained throughput of 21.4GFLOPS. Furthermore, the proposed network interconnection architecture is easily portable to the latest generation FPGAs. This makes the current proposal a competitive option for distributed processing in multi-FPGA systems.
PlasticNet+
Extending multi-FPGA interconnect architecture via Gigabit transceivers
This paper addresses the communication challenges posed in multi-FPGA systems, by improving a custom FPGA interconnect architecture via the high-speed transceivers available in modern FPGA development boards. The proposed network interconnection, built upon the PlasticNet architecture, is evaluated using the high-speed serial transceiver in Zynq ZC706 FPGA boards. Results show a best-case latency of only 300 ns, demonstrating equivalent results in terms of latency on a par with the known BlueLink framework, but with the plus of having total re-configurability across the different layers of its network interconnection model. This makes the current proposal a competitive option for the development of distributed, heterogeneous multi-FPGA processing systems.
PlasticNet
A low latency flexible network architecture for interconnected multi-FPGA systems
This paper presents preliminary results of Plastic-Net, a custom FPGA interconnect architecture designed for high-processing applications that communicate extensively among multiple FPGAs. PlasticNet allows the interconnection of processing nodes (PNs) through a flexible, reliable and efficient custom protocol, that can be easily integrated in High-Level Synthesis (HLS) modern design environments. The system is evaluated on a ZedBoard Zynq®-7000 ARM/FPGA SoC Development Board, including criteria such as overhead, area, worst-case packet delivery latency and bandwidth. The best evaluated case achieved a half-occupancy latency of 16.9μs. The results show the potential of PlasticNet as an efficient solution for low latency multi-FPGA interconnection.
This work proposes a hardware performance-oriented design methodology aimed at generating efficient high-level synthesis (HLS) coded data multiprocessing on a heterogeneous platform. The methodology is tested on typical neuroscientific complex application: the biologically accurate modeling of a brain region known as the inferior olivary nucleus (ION). The ION cells are described using a multi-compartmental model based on the extended Hodgkin-Huxley membrane model (eHH), which requires the solution of a set of coupled differential equations. The proposed methodology is tested against alternative HPC implementations (multi-core CPU i7-7820HQ, and a Virtex7 FPGA) of the same ION model for different neural network sizes. Results show that the solution runs 10 to 4 times faster than our previous implementation using the same board and closes the gap between the performance against a Virtex7 implementation without using at full-capacity the AXI-HP channels.
A heterogeneous hardware-software system implemented on an Avnet ZedBoard Zynq SoC platform, is proposed for the computation of an extended Hodgkin Huxley (eHH), biologically plausible neural model. SoC's ARM A9 is in charge of handling execution of a single neuron as defined in the eHH model, each with a O(N) computational complexity, while the computation of the gap-junctions interactions for each cell is offloaded on the SoC's FPGA, cutting its O(N2) complexity by exploiting parallel-computing hardware techniques. The proposed hw-sw solution allows for speed-ups of about 18 times visa-vis à vectorized software implementation on the SoC's cores, and is comparable to the speed of the same model optimized for a 64-bit Intel Quad Core i7, at 3.9GHz.