Porting and Evaluation of Overlay Architectures for FPGAs with Scientific Kernels

More Info
expand_more

Abstract

In recent years due to the slow down of Moores Law and Dennard Scaling, alternative architectures are starting to be used instead of plain CPU implementations. These new architectures, such as FPGAs and GPUs, offer higher performance to power consumption ratio when compared with a CPU only implementation. But these new approaches have to sacrifice programmability in favor of performance gains. While GPUs are somewhat easily programmableand provide high performance this comes at the cost of high power consumption. FPGA programming on the other hand is a tedious and time consuming task. Specialized personnel is required for this, as their programming requires a background in designing with HDL languages. Furthermore an implementation is specific to a certain algorithm and cannot be used for any other algorithm even if it is slightly different. So if a new algorithm for aparticular task is found then a part of the design process has to be redone. Also designing for FPGAs is a computationally intensive task as the whole design after simulation has to be synthesized and then placed and routed (P&R) for a particular FPGA every time the design changes slightly. This process of mapping the design can take hours or even days to compute for large designs. In recent years developments in High Level Synthesis (HLS) and OpenCL have made the whole process of designing for FPGAs an easier task. But this solution is notwithout problems either as the algorithm has to still be implemented for a specific FPGA device. A solution to the FPGA synthesis and P&R problem has recently been proposed with the name of FPGA Overlay Architectures. The core concept of this idea to abstract the FPGA create a virtual FPGA on top of the underlaying physical one in order to help with configuration and compile time. In this thesis, we investigate available alternative overlay architectures and select the most appropriate architecture for our analysis. We extended the selected architecture to be deployed on alternative FPGA hardware and to work in a shared CPU/FPGA system. Then, we implemented a number benchmarks to evaluate various aspects of system performance. Our results show that our architecture can be reconfigured in only 11.9us, as compared to seconds for full FPGA recon_guration. However, the overlay architecture uses 10.5x more LUTs and causes a drop in frequency of about 30% for the chosen architecture. For future work, there is room to improve these results by optimizing the interconnect network of the device.

Files

Msc_thesis_konstantinos_gkougk... (.pdf)
(.pdf | 0.812 Mb)
- Embargo expired in 24-01-2018