Exploring the Impact of Different Clustering Algorithms on the Performance of Ensemble Learning-Based Mass Appraisal Models

None, None; None, None; None, None

Exploring the Impact of Different Clustering Algorithms on the Performance of Ensemble Learning-Based Mass Appraisal Models

Journal Article (2026)

Author(s)

Suleyman Sisman (Gebze Technical University)

Abdullah Kara (Gebze Technical University, TU Delft - Architecture and the Built Environment)

Arif C. Aydinoglu (Gebze Technical University)

Research Group

Digital Technologies

Machine learning Artificial intelligence GIS Cluster analysis Ensemble learning Mass appraisal

DOI related publication

https://doi.org/10.3390/buildings16030615 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:75eb37c1-1a76-41d4-b581-f2d6b2a71b81

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Digital Technologies

Journal title

Buildings

Issue number

3

Volume number

16

Article number

615

Downloads counter

22

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Mass appraisal models are gaining use for improving valuation accuracy, yet their performance remains highly sensitive to how spatial and non-spatial data are structured before training. Clustering algorithms can be used to segment heterogeneous property groups into more homogeneous ones, potentially improving predictive performance. This study investigates the impact of different clustering algorithms, (i.e., K-Means, K-Medians and the Spatially Constrained Multivariate Clustering Algorithm (SCMCA)), on the performance of prominent ensemble learning-based mass appraisal models (i.e., Random Forest (RF), the Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost) and the Light Gradient Boosting Machine (LightGBM)). Using a comprehensive real estate dataset, clustering quality is evaluated using Silhouette, Calinski–Harabasz, and Davies–Bouldin indices, and the performance of cluster-based ensemble mass appraisal models is then compared. The findings indicate that the best performance is achieved with the SCMCA–LightGBM model combination, which reached RMSE = 0.061 and R2 = 0.722. Furthermore, it is determined that clustering-based models provide improvements of up to 7.26% in MAE, 10.61% in MAPE, and 8.40% in RMSE, depending on the combination. The results show that clustering is an effective preprocessing step that can substantially enhance the predictive performance and overall quality of mass appraisal models.

Files

Buildings-16-00615.pdf

(pdf | 12.4 Mb)