Print Email Facebook Twitter Parallelizing a Video Filter-chain for Multi- and Many-core Systems Title Parallelizing a Video Filter-chain for Multi- and Many-core Systems Author Chi, Huang-Da (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Computer Engineering) Contributor Al-Ars, Zaid (mentor) Alvarez-Mesa, Mauricio (mentor) Degree granting institution Delft University of Technology Date 2018-01-18 Abstract Developing parallel applications to make efficient use of current and emerging parallel architectures remains a big challenge in modern application development where performance is a first-class citizen. Optimizing conventional video filter applications to take advantage of current multi- and many-core architectures is one such application. This thesis investigates the case of how to parallelize a video filter-chain to obtain maximum performance on Intel's newest many-core architecture, the Knights Landing platform, and also on a conventional Haswell Xeon server platform. Implemented optimizations include line parallelization, multiple frames in flight, and AVX-512. The line parallelization and multiple frames in flight optimizations are both coarse-grained parallelism strategies, focusing on minimizing synchronization and communication overhead while the AVX-512 optimization was a fine-grained parallelism strategy. Challenges found with the coarse-grained parallelism strategies are primarily load balancing issues. The line parallelization approach paired with the multiple frames in flight optimization managed to achieve a speedup of 27.14x for the 28-core Xeon server system and 95.47x for the Knights Landing system with the compute-intensive 8k color conversion benchmark. Memory-intensive benchmarks such as blend had lower but still decent overall speedups at 9.76x and 25.34x speedup for the Xeon server and Knights Landing platform respectively. The AVX-512 optimization for color conversion and scale resulted in a single-threaded performance enhancement of 1.41x and 1.60x speedup respectively. We can conclude from the experimental data analysis that for video filter applications, data parallelization strategies are very effective. Especially for compute-intensive filters such as color conversion, it can net up to near linear speedup to the amount of cores. The main limitation prohibiting speedup found in some other filters is memory bandwidth. Subject parallelismfilterVideoscalabilityPerformance To reference this document use: http://resolver.tudelft.nl/uuid:8f168240-026e-47ba-9cd8-4d3e657249aa Part of collection Student theses Document type master thesis Rights © 2018 Huang-Da Chi Files PDF thesis.pdf 10.04 MB Close viewer /islandora/object/uuid:8f168240-026e-47ba-9cd8-4d3e657249aa/datastream/OBJ/view