Inferring the number of floors of building footprints in the Netherlands

More Info
expand_more

Abstract

Data on the number of floors is required for a variety of applications, ranging from energy demand estimation to flood response plans. Despite this, open data on the number of floors is currently not available at a nationwide level in the Netherlands. This means that it must be inferred from other available data. Automatic methods usually involve dividing the estimated height of a building by an assumed storey height. In some cases, this simple approach limits the accuracy of the results. Therefore, the goal of this thesis is to develop an alternative method to automatically infer the number of floors.

Three different machine learning algorithms are tested and compared: Random Forest, Gradient Boosting and Support Vector Regression. These algorithms are trained using data on the number of floors obtained from four municipalities in the Netherlands. In addition, 25 features are derived from cadastral attributes, building geometry and neighbourhood census data. These features are tested in different combinations in order to determine whether a specific subset yielded better results. Furthermore, a comparison is made between features derived from 3D building models at different levels of detail.

The results show that building height, particularly 70th percentile height, is most related to the number of floors. Other 3D geometric features are also found to be quite closely related to the number of floors, specifically roof area and volume. However, a higher level of detail did not improve the results. Cadastral features are also found to be relevant; mainly net internal area and, to a lesser extent, construction year. Furthermore, models based on a combination of different features performed better than models based on single categories of features.

The best predictive model achieved an accuracy of 94.5% and a Mean Absolute Error (MAE) of 0.06 for buildings with 5 floors or less. This represented a substantial improvement on the results of the geometric approach, which had an accuracy of 69.9% and MAE of 0.31. However, above 5 floors, model performance was substantially lower. Machine learning provided only a slight improvement on the geometric approach for these buildings. In this case, the best model had an accuracy of 52.3% and MAE of 0.62, whereas the geometric approach was 47.5% accurate and had a MAE of 0.70. A comparison of the cumulative error distributions showed that the best model mainly improved the fraction of buildings that were predicted with an error of less than 1 floor. Overall, these results show that machine learning partially provided a better estimate of the number of floors than a purely geometric approach.