Efficient Execution of Video Applications on Heterogeneous Multi- and Many-Core Processors

Doctoral thesis (2011)

Authors

A. Pereira de Azevedo Filho

Contributors

B.H.H. Juurlink (promotor)

Department

Software Technology () (TU Delft)

Video Processing Parallel Processing Processor Architecture Scratchpad Memory Software Cache SIMD Processing

To reference this document use:

http://resolver.tudelft.nl/uuid:09179df1-7418-4b33-8a75-e0c557c6079c

More Info

expand_more

Published Date

06-06-2011

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Software Technology

Abstract

In this dissertation we present methodologies and evaluations aiming at increasing the efficiency of video coding applications for heterogeneous many-core processors composed of SIMD-only, scratchpad memory based cores. Our contributions are spread in three different fronts: thread-level parallelism strategies for many-cores, identification of bottlenecks for SIMD-only cores, and software cache for scratchpad memory based cores. First, we present the 3D-Wave parallelization strategy for video decoding that scales for many-core processors. It is based on the observation that dependencies between frames are related with the motion compensation kernel and motion vectors are usually within a small range. The 3D-Wave strategy combines macroblock-level parallelism with frame- and slice-level parallelism by overlapping the decoding of frames while dynamically managing macroblock dependencies. The 3D-Wave was implemented and evaluated in a simulated many-core embedded processor consisting of 64 cores. Policies for reducing memory footprint and latency are presented. The effects of memory latency, cache size, and synchronization latency are studied. The assessment of SIMD-only cores for the increasing complexity of current multimedia kernels is our second contribution. We evaluate the suitability of SIMD-only cores for the increasing divergent branching in video processing algorithms. The H.264 Deblocking Filter is used as test case. Also, the overhead imposed by the lack of a scalar processing unit for SIMD-only cores is measured using two methodologies. Low area overhead solutions are proposed to add scalar support to SIMD-only cores. Finally, we focus on the memory hierarchy and we propose a new software cache organization to increase the efficiency and efficacy of scratchpad memories for unpredictable and indirect memory accesses. The proposed Multidimensional Software Cache reduces software cache overhead by allowing the programmer to exploit known access behavior in order to reduce the number of accesses to the software cache and by grouping memory requests. An instruction to accelerate MDSC lookup is also presented and analyzed.

Files

Thesis_EXV.pdf

(pdf | 1.71 Mb)

Unknown license