Analysis of the effect of caching convolutional network layers on resource constraint devices

More Info
expand_more

Abstract

Using transfer learning, convolutional neural networks for different purposes can have similar layers which can be reused by caching them, reducing their load time. Four ways of loading and executing these layers, bulk, linear, DeepEye and partial loading, were analysed under different memory constraints and different amounts of similar networks. When there is sufficient memory, caching will decrease the loading time and will always influence the single threaded bulk and linear mode. On the multithreaded approaches this only holds when the loading time is longer than the execution time. This depends largely on what network will be run. When memory constraints are applied caching can be a way to still run multiple networks without much increased cost. It can also be opted to use less memory on a device and use transfer learning with caching to still get the same results.

Files