Exact Machine Learning: Improving Space and Speed of MaxSAT Solvers for Correlation Clustering

Master Thesis (2021)
Author(s)

M.A. Marchal (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Emir Demirović – Mentor (TU Delft - Algorithmics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Maxim Marchal
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Maxim Marchal
Graduation Date
27-08-2021
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Data Science and Technology']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Clustering is an important unsupervised learning task, with many applications in machine learning, computer vision, formal program verification and finance. Heuristic approaches such as local search are an excellent strategy for estimating the optimal solution, but they run the danger of getting stuck in local optima. Furthermore, it is non­trivial to add user­specific constraints or other objectives to these approaches without compromising their functionality, which makes them inflexible. Generic optimisa­ tion techniques such as MaxSAT and Integer Programming are more versatile, but they scale poorly on bigger datasets. Additionally, the fact that the search space of correlation clustering is inherently symmetric only exacerbates the scalability issue, especially for exhaustive search strategies. In this work, a novel customised search algorithm for correlation clustering is proposed. It aims to unite the flexibility of exhaustive optimisation techniques with an improved scalability and the possibility to declaratively specify domain specific knowledge in the form of a propagator interface. The algorithm uses a specialised hybrid MaxSAT solver with a symmetry­free encoding. Traditionally, this encoding would restrict scalability due to the prohibitive space complexity. However, this problem is avoided by using lazy data structures to represent key components of the encoding, which reduces the space requirement by several orders of magnitude. In addition, a novel encoding is presented that drastically reduces the runtime for sparse graphs. The algorithm employs incremental bounding techniques that under certain conditions reduce the number of decisions required by the solver to prove optimality.

Files

Thesis_Maxim_compile_report_ap... (pdf)
(pdf | 6.36 Mb)
- Embargo expired in 31-12-2021
License info not available