Clustering is an important unsupervised learning task, with many applications in machine learning, computer vision, formal program verification and finance. Heuristic approaches such as local search are an excellent strategy for estimating the optimal solution, but they run the d
...
Clustering is an important unsupervised learning task, with many applications in machine learning, computer vision, formal program verification and finance. Heuristic approaches such as local search are an excellent strategy for estimating the optimal solution, but they run the danger of getting stuck in local optima. Furthermore, it is nontrivial to add userspecific constraints or other objectives to these approaches without compromising their functionality, which makes them inflexible. Generic optimisa tion techniques such as MaxSAT and Integer Programming are more versatile, but they scale poorly on bigger datasets. Additionally, the fact that the search space of correlation clustering is inherently symmetric only exacerbates the scalability issue, especially for exhaustive search strategies. In this work, a novel customised search algorithm for correlation clustering is proposed. It aims to unite the flexibility of exhaustive optimisation techniques with an improved scalability and the possibility to declaratively specify domain specific knowledge in the form of a propagator interface. The algorithm uses a specialised hybrid MaxSAT solver with a symmetryfree encoding. Traditionally, this encoding would restrict scalability due to the prohibitive space complexity. However, this problem is avoided by using lazy data structures to represent key components of the encoding, which reduces the space requirement by several orders of magnitude. In addition, a novel encoding is presented that drastically reduces the runtime for sparse graphs. The algorithm employs incremental bounding techniques that under certain conditions reduce the number of decisions required by the solver to prove optimality.