Machine learning in the housing sector

Machine Learning

The digital age has led to a rapid growth in the size, speed and intricacy of data generated. The traditional methods of how we store and analyse data are increasingly being challenged and deemed obsolete. Consequently, distributed storage systems, such as Hadoop, and machine learning techniques have emerged, with the advantage of being better equipped to handle large, complex datasets. 

To put it simply, machine learning is a subset of Artificial Intelligence that enables a computer to learn from data itself. As a result, machine learning algorithms can automatically identify patterns and trends in data, providing valuable insights to its users.

Despite being a fairly new technology, many industries now utilise machine learning techniques, including in healthcare, manufacturing, finance and retail sectors.[1] In this blog post, we discuss some ways in which machine learning can be applied to the Housing sector.

As in other industries, digital transformation continues to push traditionally offline services online, expanding the amount of data available to housing associations. We see machine learning as a potential way for the housing sector to take advantage of the growing amount of information at their disposal, enabling them to increase internal efficiency and to offer improved and targeted services to their clients.

Supervised and Unsupervised Learning

Generally, machine learning techniques can be split into two groups: supervised methods and unsupervised methods. The difference is based on whether there are target outputs established beforehand. If so, it is a supervised method. If not, it is an unsupervised method, meaning that the model is fed an input and learns based on similar and different data points.

To make this clearer, two supervised methods and two unsupervised methods are provided, with examples of its application to the housing sector.


Regression is a supervised method used to predict numerical outputs. This method is used when variables are continuous.

A housing company, for example, can use regression to predict:

·         annual income and expenditure

·         the value of a property

·         the number of tenants and customers

·         the number of voids and arrears

·         the amount of rent collected, etc.

Regression analysis can also be used to understand and compare the effects of independent variables on a dependent variable. For example, if analysing a property’s value, what effect does its size, location, local crime rate and so on have on its value, and which variable has the greater impact?


Another supervised method is classification, which is used to predict categorical (a.k.a. discrete) outputs. Classification algorithms group data into predetermined classes, allowing for predictions to be made about which class new data points fall in.

A housing company can use classification to predict:

·         whether a tenant is likely to fall into arrears

·         whether a tenant is likely to be satisfied with a service

·         whether a tenant is likely to extend their contract

·         whether a property is likely to be sold/let

·         whether a property is likely to require maintenance, etc.


Clustering is an example of an unsupervised method, typically used for exploratory analysis. This means that, rather than making predictions, the goal is to simply gain a greater understanding of the data, discovering any patterns or points of interest. Observations are grouped into clusters based on their similarities and differences.

A housing company can use this technique to understand their tenants and properties more fully. For example, tenants can be placed into groups based on similar characteristics, such as age, gender, occupation, length of time in the property etc.  Subsequently, the housing company can then target these groups more efficiently, for marketing or maintenance purpose, for instance.

Likewise, properties can be clustered based on features like value, size, geographical location, type of property, proximity to schools and other public services. This information can facilitate a better understanding of the market, for example, gaining insights about which types of properties are in demand.

Outlier Detection

A second unsupervised method is outlier detection, which is a technique used to identify unusual or unexpected data points (anomalies). This method is useful for detecting fraud, criminal activity or behaviour that does not conform to the norm.

In the context of housing, examples include the identification of unexpected spikes in voids or arrears, whether a property is significantly overvalued or undervalued, or whether a property requires repairing too often.


To conclude, machine learning is a growing field that is increasingly being used across various industries. This blog post has outlined ways in which supervised and unsupervised machine learning techniques, including regression, classification, clustering and outlier detection, can be applied to the housing sector.


by Daniel Fitton