WL

Wayne Luk

22 records found

This study introduces a methodology for forecasting accelerator performance in Particle Physics algorithms. Accelerating applications can require significant engineering effort, prototyping and measuring the speedup that might finally result in disappointing accelerator performan ...
Neural Networks (NN) are often trained offline on large datasets and deployed on specialised hardware for inference, with a strict separation between training and inference. However, in many realistic applications the training environment differs from the real world, or data arri ...
Random number generation is key to many applications in a wide variety of disciplines. Depending on the application, the quality of the random numbers from a particular generator can directly impact both computational performance and critically the outcome of the calculation. Hig ...

Accelerating Large-Scale Graph Processing with FPGAs

Lesson Learned and Future Directions

Processing graphs on a large scale presents a range of difficulties, including irregular memory access patterns, device memory limitations, and the need for effective partitioning in distributed systems, all of which can lead to performance problems on traditional architectures s ...
Background and purpose: Physiological motion impacts the dose delivered to tumours and vital organs in external beam radiotherapy and particularly in particle therapy. The excellent soft-tissue demarcation of 4D magnetic resonance imaging (4D-MRI) could inform on intra-fractional ...
Processing large-scale graphs is challenging due to the nature of the computation that causes irregular memory access patterns. Managing such irregular accesses may cause significant performance degradation on both CPUs and GPUs. Thus, recent research trends propose graph process ...
We propose a design methodology to facilitate rigorous development of complex applications targeting reconfigurable hardware. Our methodology relies on analytical estimation of system performance and area utilisation for a given specific application and a particular system instan ...
Magnetic Resonance (MR)-guided online Adaptive RadioTherapy (MRgoART) utilises the excellent soft-tissue contrast of MR images taken just before the patient's treatment to quickly update and personalise radiotherapy treatment plans. Four-dimensional (4D) MR Imaging (MRI) can reso ...
Field Programmable Gate Arrays (FPGAs) are gaining popularity in the context of scientific computing due to the recent advances of High-Level Synthesis (HLS) toolchains for customised hardware implementations combined with the increase in computing capabilities of modern FPGAs. A ...
We propose a novel reconfigurable hardware architecture to implement Monte Carlo based simulation of physical dose accumulation for intensity-modulated adaptive radiotherapy. The long term goal of our effort is to provide accurate dose calculation in real-time during patient trea ...
We propose a novel reconfigurable hardware architecture to implement Monte Carlo based simulation of physical dose accumulation for intensity-modulated adaptive radiotherapy. The long term goal of our effort is to provide accurate online dose calculation in real-time during patie ...
This paper proposes an algorithm for mapping logical to physical memory resources on FPGAs. Our greedy strategy based algorithm is specifically designed to facilitate timing closure on modern multi-die FPGAs for static-dataflow accelerators utilising most of the on-chip resources ...
In this paper we propose a technique to minimise the area overhead of a double buffered implementation of Radix-4 Fast Fourier Transformation (FFT). Our proposal circumvents the need for double buffering by exploiting opportunities in the specific data reordering of the buffers t ...
In this paper we discuss a high performance implementation for Convolutional Neural Networks (CNNs) inference on the latest generation of Dataflow Engines (DFEs). We discuss the architectural choices made during the design phase taking into account the DFE chip properties. We the ...
We demonstrate the use of dataflow technology in the computation of the correlation energy in molecules at the Møller-Plesset perturbation theory (MP2) level. Specifically, we benchmark density fitting (DF)-MP2 for as many as 168 atoms (in valinomycin) and show that speed-ups bet ...

EXTRA

Towards the exploitation of eXascale technology for reconfigurable architectures

To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with ...

EXTRA

Towards an efficient open platform for reconfigurable High Performance Computing

To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with ...

FPGA-based design using the FASTER toolchain

The case of STM spear development board

Even though FPGAs are becoming more and more popular as they are used in many different scenarios like communications and HPC, the steep learning curve needed to work with this technology is still the major limiting factor to their full success. Many works proposed to mitigate th ...
While fine-grain, reconfigurable devices have been available for years, they are mostly used in a fixed functionality, "asic-replacement" manner. To exploit opportunities for flexible and adaptable run-time exploitation of fine grain reconfigurable resources (as implemented curre ...

FASTER

Facilitating analysis and synthesis technologies for effective reconfiguration

The FASTER project aims to ease the definition, implementation and use of dynamically changing hardware systems. Our motivation stems from the promise reconfigurable systems hold for achieving better performance and extending product functionality and lifetime via the addition of ...