WL

Wayne Luk

Authored

19 records found

Accelerating Large-Scale Graph Processing with FPGAs

Lesson Learned and Future Directions

Processing graphs on a large scale presents a range of difficulties, including irregular memory access patterns, device memory limitations, and the need for effective partitioning in distributed systems, all of which can lead to performance problems on traditional architectures s ...

FASTER

Facilitating analysis and synthesis technologies for effective reconfiguration

The FASTER project aims to ease the definition, implementation and use of dynamically changing hardware systems. Our motivation stems from the promise reconfigurable systems hold for achieving better performance and extending product functionality and lifetime via the addition of ...

FPGA-based design using the FASTER toolchain

The case of STM spear development board

Even though FPGAs are becoming more and more popular as they are used in many different scenarios like communications and HPC, the steep learning curve needed to work with this technology is still the major limiting factor to their full success. Many works proposed to mitigate th ...
While fine-grain, reconfigurable devices have been available for years, they are mostly used in a fixed functionality, "asic-replacement" manner. To exploit opportunities for flexible and adaptable run-time exploitation of fine grain reconfigurable resources (as implemented curre ...
Current and future computing systems increasingly require that their functionality stays flexible after the system is operational, in order to cope with changing user requirements and improvements in system features, i.e. changing protocols and data-coding standards, evolving dem ...

EXTRA

Towards an efficient open platform for reconfigurable High Performance Computing

To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with ...
We demonstrate the use of dataflow technology in the computation of the correlation energy in molecules at the Møller-Plesset perturbation theory (MP2) level. Specifically, we benchmark density fitting (DF)-MP2 for as many as 168 atoms (in valinomycin) and show that speed-ups bet ...

EXTRA

Towards the exploitation of eXascale technology for reconfigurable architectures

To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with ...
Processing large-scale graphs is challenging due to the nature of the computation that causes irregular memory access patterns. Managing such irregular accesses may cause significant performance degradation on both CPUs and GPUs. Thus, recent research trends propose graph process ...
We propose a novel reconfigurable hardware architecture to implement Monte Carlo based simulation of physical dose accumulation for intensity-modulated adaptive radiotherapy. The long term goal of our effort is to provide accurate dose calculation in real-time during patient trea ...
We propose a novel reconfigurable hardware architecture to implement Monte Carlo based simulation of physical dose accumulation for intensity-modulated adaptive radiotherapy. The long term goal of our effort is to provide accurate online dose calculation in real-time during patie ...
We propose a design methodology to facilitate rigorous development of complex applications targeting reconfigurable hardware. Our methodology relies on analytical estimation of system performance and area utilisation for a given specific application and a particular system instan ...
Background and purpose: Physiological motion impacts the dose delivered to tumours and vital organs in external beam radiotherapy and particularly in particle therapy. The excellent soft-tissue demarcation of 4D magnetic resonance imaging (4D-MRI) could inform on intra-fractional ...
This paper proposes an algorithm for mapping logical to physical memory resources on FPGAs. Our greedy strategy based algorithm is specifically designed to facilitate timing closure on modern multi-die FPGAs for static-dataflow accelerators utilising most of the on-chip resources ...
In this paper we discuss a high performance implementation for Convolutional Neural Networks (CNNs) inference on the latest generation of Dataflow Engines (DFEs). We discuss the architectural choices made during the design phase taking into account the DFE chip properties. We the ...
Magnetic Resonance (MR)-guided online Adaptive RadioTherapy (MRgoART) utilises the excellent soft-tissue contrast of MR images taken just before the patient's treatment to quickly update and personalise radiotherapy treatment plans. Four-dimensional (4D) MR Imaging (MRI) can reso ...
In this paper we propose a technique to minimise the area overhead of a double buffered implementation of Radix-4 Fast Fourier Transformation (FFT). Our proposal circumvents the need for double buffering by exploiting opportunities in the specific data reordering of the buffers t ...
Field Programmable Gate Arrays (FPGAs) are gaining popularity in the context of scientific computing due to the recent advances of High-Level Synthesis (HLS) toolchains for customised hardware implementations combined with the increase in computing capabilities of modern FPGAs. A ...
During the last few years, there is an increasing interest in mixing software and hardware to serve efficiently different applications. This is due to the heterogeneity characterizing the tasks of an application which require the presence of resources from both worlds, software a ...