Training Neural Networks with Discrete Optimization Solvers
More Info
expand_more
Abstract
Recent work has shown potential in using Mixed Integer Programming (MIP) solvers to optimize certain aspects of neural networks (NN). However little research has gone into training NNs with MIP solvers. State of the art methods to train NNs are typically gradient-based and require significant amounts of data, computation on GPUs and extensive hyper-parameter tuning. In contrast, training with MIP solvers should not require GPUs or hyper-parameter tuning but can likely not handle large amounts of data. This work builds on recent advances that train binarized NNs using MIP solvers. We go beyond current work by formulating new MIP models to increase the amount of data that can be feasibly used. We also extend current work to be able to train non-binary integer-valued networks. We show the potential benefits of optimizing NNs with regards to fairness and model compression. We also propose a novel batch training method to considerably increase the amount of training data that can be used. We conduct experiments to test our proposed methodology. The experimental results firstly show that we have improved upon recent work and that comparable results to gradient-based methods can be achieved with minimal data. Secondly, we find that there is potential in optimizing with regards to fairness and model compression. Finally, our results show that batch training can be used to utilize more data, increase generalization and reach comparable results to gradient-based training for integer-valued networks.