Evaluating and Improving Large-Scale Machine Learning Frameworks

More Info
expand_more

Abstract

Given the increasing popularity of Machine Learning, and the ever increasing need to solve larger and more complex learning challenges, it is unsurprising that numerous distributed learning strategies have been brought forward in recent years, along with many large scale Machine Learning frameworks. It is however unclear how well these strategies perform across different cluster and batch sizes, or what their hardware demands are, as there is little research in the public domain on this matter. Identifying the weaknesses and limitations of the parameter update strategies is, however, essential towards increasing the efficiency of large scale Machine Learning and making it commonplace. This thesis seeks to find the answers to these aforementioned issues, and provide evidence of the strategies’ limitations and the root causes behind them. To make the study possible, the thesis looks into particular implementations of the strategies within the TensorFlow and Caffe2 frameworks.