Exploring the Impact of Different Clustering Algorithms on the Performance of Ensemble Learning-Based Mass Appraisal Models
Suleyman Sisman (Gebze Technical University)
Abdullah Kara (Gebze Technical University, TU Delft - Architecture and the Built Environment)
Arif C. Aydinoglu (Gebze Technical University)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Mass appraisal models are gaining use for improving valuation accuracy, yet their performance remains highly sensitive to how spatial and non-spatial data are structured before training. Clustering algorithms can be used to segment heterogeneous property groups into more homogeneous ones, potentially improving predictive performance. This study investigates the impact of different clustering algorithms, (i.e., K-Means, K-Medians and the Spatially Constrained Multivariate Clustering Algorithm (SCMCA)), on the performance of prominent ensemble learning-based mass appraisal models (i.e., Random Forest (RF), the Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost) and the Light Gradient Boosting Machine (LightGBM)). Using a comprehensive real estate dataset, clustering quality is evaluated using Silhouette, Calinski–Harabasz, and Davies–Bouldin indices, and the performance of cluster-based ensemble mass appraisal models is then compared. The findings indicate that the best performance is achieved with the SCMCA–LightGBM model combination, which reached RMSE = 0.061 and R2 = 0.722. Furthermore, it is determined that clustering-based models provide improvements of up to 7.26% in MAE, 10.61% in MAPE, and 8.40% in RMSE, depending on the combination. The results show that clustering is an effective preprocessing step that can substantially enhance the predictive performance and overall quality of mass appraisal models.