Memory and Communication Profiling for Accelerator-Based Platforms

None, None; None, None; None, None; None, None

Memory and Communication Profiling for Accelerator-Based Platforms

Journal Article (2018)

Author(s)

Imran Ashraf (TU Delft - QuTech Advanced Research Centre, TU Delft - Computer Engineering)

Nader Khammassi (TU Delft - QuTech Advanced Research Centre)

M. Taouil (TU Delft - Computer Engineering)

K. Bertels (TU Delft - Quantum & Computer Engineering, TU Delft - QuTech Advanced Research Centre)

Research Group

Computer Engineering

Copyright

DOI related publication

https://doi.org/10.1109/TC.2017.2785225

Open source software Acceleration Computer architecture Tools Instruments Field programmable gate arrays Graphics processing units

To reference this document use:

https://resolver.tudelft.nl/uuid:a9302961-c459-4aec-af4b-a2021f75ec4f

More Info

expand_more

Publication Year

2018

Language

English

Copyright

Research Group

Computer Engineering

Issue number

7

Volume number

67

Pages (from-to)

934-948

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The growing demand of processing power is being satisfied mainly by an increase in the number of homogeneous and heterogeneous computing cores in a system. Efficient utilization of these architectures demands analysis of memory-access behaviour of applications and perform data-communication aware mapping of applications on these architectures. Appropriate tools are required to highlight memory-access patterns and provide detailed intra-application data-communication information to assist developers in porting existing sequential applications efficiently to these architectures. In this work, we present the design of an open-source tool which provides such a detailed profile for C/C++ applications. In contrast to prior work, our tool not only reports detailed information, but also generates this information with manageable overheads for realistic workloads. Comparison with the state-of-the-art shows that the proposed profiler has, on the average, an order of magnitude less overhead as compared to the state-of-the-art data-communication profilers for a wide range of benchmarks. The experimental results show that our proposed tool generated profiling information for image processing applications which assisted in achieving a speed-up of 6.14× and 2.75× for heterogeneous multi-core platforms containing an FPGA and a GPU as accelerators, respectively.

Files

37936803_main.pdf

(pdf | 6.91 Mb)

License info not available