Quantitative Application Data Flow Characterization for Heterogeneous Multicore Architectures

More Info
expand_more

Abstract

Recent trends show a steady increase in the utilization of heterogeneous multicore architectures in order to address the ever-growing need for computing performance. These emerging architectures pose specific challenges with regard to their programmability. In addition, they require efficient application mapping schemes to fully harness their processing power and avoid bottlenecks. In this respect, it is of critical importance to analyse application behaviour, and the data communication between tasks, in particular. In this dissertation, we present a profiling framework that helps developers to gain an insight into the behaviour of an application. The presented profiling framework is generic and not restricted to a particular platform, application, or purpose. We utilize this framework with the primary goal of mapping applications onto a heterogeneous multicore architecture. The framework includes a memory access profiling toolset, called QUAD, that provides quantitative information regarding the memory accesses in an application. QUAD utilizes Dynamic Binary Instrumentation (DBI) to detect the actual data dependencies that occur between the tasks of an application at runtime. Additionally, it also provides accurate memory access measurements, such as the amount of data transferred between tasks and the memory size required for their communication. Such information can be utilized to identify critical parts of an application, to highlight coarse-grained parallelism opportunities, and to guide code optimizations. As a proof of concept to substantiate the usefulness of the extracted profiling information, we utilize the main output of QUAD, the Quantitative Data Usage (QDU) graph, as the input model to formulate a general application partitioning problem. The formulation of this intractable problem is flexible and accommodates different design objectives and constraints. Subsequently, we propose a heuristic algorithm to find high quality partitions of an application in a reasonable amount of time. In addition to the complexity analysis of the proposed algorithm, we present a thorough theoretical analysis of the application partitioning problem. In order to evaluate the quality of the solutions, we developed a test bench for generating synthetic QDU graphs and compared the results against the optimal partitions obtained using an exhaustive search. The comparison results show that the proposed heuristic algorithm is able to provide optimal or near-optimal solutions. To further prove the applicability of the profiling framework, we investigate in detail the utilization of the framework in practice, by mapping two real applications onto a heterogeneous reconfigurable architecture. To achieve this goal, we propose a hardware/software partitioning methodology that introduces the concept of merging tightly-coupled tasks based on the data communication analysis. Moreover, the profiling information is utilized to fine-tune the applications and optimize their data flow. The obtained results show a performance increase of 192% and 30%.