EasyCompress

Automated Compression for Deep Learning Models

Master Thesis (2023)
Author(s)

A. Van Steenweghen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Luís Cruz – Mentor (TU Delft - Software Engineering)

Rui Maranhao – Graduation committee member (Universidade do Porto)

Arie van van Deursen – Coach (TU Delft - Software Technology)

Jan C. Gemert – Coach (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2023
Language
English
Graduation Date
03-07-2023
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Software Technology']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Over the past years the size of deep learning models has been growing consistently. This growth has led to significant improvements in performance, but at the expense of increased computational resource demands. Compression techniques can be used to improve the efficiency of deep learning models by shrinking their size and computational needs, while
preserving performance.


This thesis presents EasyCompress, an automated and user-friendly tool to compress deep learning models. The tool improves on existing compression research by focusing on generalizability and practical usability, in three ways. Firstly, it aligns with specific compression objectives and performance requirements, ensuring the compression accomplishes its intended goal effectively. Secondly, it employs flexible compression techniques, so that it is applicable to a diverse set of models without requiring deep model knowledge. Finally, it automates the compression process, eliminating difficult and time-consuming implementation
efforts.


EasyCompress intelligently selects, tailors, and combines various compression techniques to minimize model size, latency, or number of computations while preserving performance. It employs structured pruning to reduce the number of parameters and computations, uses knowledge distillation techniques to ensure better accuracy recovery, and uses quantization to achieve additional compression.


The tool’s effectiveness is evaluated across diverse model architectures and configurations. Experimental results on a range of models and datasets demonstrate its ability to reduce the model size at least 5-fold, inference time by at least 1.5-fold, and the number of computations by at least 3-fold. Most compression rates are even higher, reaching up to 10, 20, and even 100-fold reductions.


The tool is available online at https://thesis.abelvansteenweghen.com.

Files

Thesis_Final.pdf
(pdf | 1.23 Mb)
License info not available