Memory and Communication Profiling for Accelerator-Based Platforms

Journal Article (2018)
Author(s)

Imran Ashraf (TU Delft - QuTech Advanced Research Centre, TU Delft - Computer Engineering)

Nader Khammassi (TU Delft - QuTech Advanced Research Centre)

M Taouil (TU Delft - Computer Engineering)

K.L.M. Bertels (TU Delft - Quantum & Computer Engineering, TU Delft - QuTech Advanced Research Centre)

Research Group
Computer Engineering
Copyright
© 2018 I. Ashraf, N. Khammassi, M. Taouil, K.L.M. Bertels
DOI related publication
https://doi.org/10.1109/TC.2017.2785225
More Info
expand_more
Publication Year
2018
Language
English
Copyright
© 2018 I. Ashraf, N. Khammassi, M. Taouil, K.L.M. Bertels
Research Group
Computer Engineering
Issue number
7
Volume number
67
Pages (from-to)
934-948
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The growing demand of processing power is being satisfied mainly by an increase in the number of homogeneous and heterogeneous computing cores in a system. Efficient utilization of these architectures demands analysis of memory-access behaviour of applications and perform data-communication aware mapping of applications on these architectures. Appropriate tools are required to highlight memory-access patterns and provide detailed intra-application data-communication information to assist developers in porting existing sequential applications efficiently to these architectures. In this work, we present the design of an open-source tool which provides such a detailed profile for C/C++ applications. In contrast to prior work, our tool not only reports detailed information, but also generates this information with manageable overheads for realistic workloads. Comparison with the state-of-the-art shows that the proposed profiler has, on the average, an order of magnitude less overhead as compared to the state-of-the-art data-communication profilers for a wide range of benchmarks. The experimental results show that our proposed tool generated profiling information for image processing applications which assisted in achieving a speed-up of 6.14× and 2.75× for heterogeneous multi-core platforms containing an FPGA and a GPU as accelerators, respectively.

Files

37936803_main.pdf
(pdf | 6.91 Mb)
License info not available