Determining Performance Boundaries and Automatic Loop Optimization of High-Level System Specifications

More Info
expand_more

Abstract

Designers are confronted with high time-to-market pressure and an increasing demand for computational power. As a result, they are required to identify as early as possible the quality of a specification for an intended technology. The designer needs to know if this specification can be improved, and at what cost. Specification trade-offs are often based on the experience and intuition of a designer, which in itself is not enough to make design decisions given the complexity of modern designs. Therefore, we need to identify the performance boundaries for the execution of a specification on an intended technology. The degree of parallelism, required resources, scheduling constraints, and possible optimizations, etc. are essential in determining design trade-offs (e.g., power consumption, execution time, etc). However, existing tools lack the capability of determining relevant performance parameters and the option to automatically optimize high-level specifications to make meaningful design trade-offs. To address these problems, we present in this thesis a new profiler tool, cprof. The Clang compiler front-end is used in this tool to parse high-level specifications, and to produce instrumented source code for the purpose of profiling. This tool automatically determines, from high-level specifications, the degree of parallelism of a given source code, specified in C and C++ programming languages. Furthermore, cprof estimates the number of clock cycles necessary to complete a program, it automatically applies loop optimization techniques, it determines the lower and upper bound on throughput capacity, and finally, it generates hardware execution traces. The tool assumes that the specification is executed on a parallel Model of Computation (MoC), referred to as a Polyhedral Process Network (PPN). The proposed tool adds new functionality to existing technologies: the estimated performance by cprof of PolyBench/C benchmarks, as compared to realistic implementations in Field-Programmable Gate Arrays (FPGA) platforms, showed to be almost identical. Cprof is capable of estimating the lower and upper bound on throughput capacity, making it possible for the designer to make performance trade-offs based on real design points. As a result, only the high-level specification is used by cprof to assist in Design Space Exploration (DSE) and to improve design quality.