Predicting ages of residential buildings from map data

Dr Julian Rosser and co-authors have a new article accepted in Computers, Environment and Urban Systems (CEUS). The paper describes a machine learning approach to inferring the age of residential buildings based on features extracted from map databases. Building age is not commonly available in the UK at the individual property level, however, such data is vital in estimating energy usage. The problem is treated as a supervised classification task and where a random forest is trained to estimate an age category / band according to the building’s shape and neighbourhood characteristics. Approaches for improving the predictive model performance by exploiting the predicted class probabilities, and spatial / topological relations between buildings are then tested. Taking inspiration from graph-based techniques used in image segmentation methods, we can introduce some spatial reasoning to post-process and improve class predictions.

Further details are available in paper. The full abstract is below.

The age of a building influences its form and fabric composition, and this in turn is critical to inferring its energy performance. However, often this data is unknown. In this paper, we present a methodology to automatically identify the construction period of houses, for the purpose of urban energy modelling and simulation. We describe two major stages to achieving this – a per-building classification model and post-classification analysis to improve the accuracy of the class inferences. In the first stage, we extract measures of the morphology and neighbourhood characteristics from readily available topographic mapping, a high-resolution Digital Surface Model and statistical boundary data. These measures are then used as features within a random forest classifier to infer an age category for each building. We evaluate various predictive model combinations based on scenarios of available data, evaluating these using 5-fold cross-validation to train and tune the classifier hyper-parameters based on a sample of city properties. A separate sample estimated the best performing cross-validated model as achieving 77% accuracy. In the second stage, we improve the inferred per-building age classification (for a spatially contiguous neighbourhood test sample) through aggregating prediction probabilities using different methods of spatial reasoning. We report on three methods for achieving this based on adjacency relations, near neighbour graph analysis and graph-cuts label optimisation. We show that post-processing can improve the accuracy by up to 8 percentage points.

Article link (open access copy, before typesetting): https://www.researchgate.net/publication/326920098_Predicting_residential_building_age_from_map_data

Journal link (subscription required): https://doi.org/10.1016/j.compenvurbsys.2018.08.004