Perfect Comps

Identifying Comparable Real Estate Properties using Machine Learning

More Info
expand_more

Abstract

For traditional, manual real estate appraisals, the appraiser is required to provide a number of comparable properties (the 'Comps'). These comps act as a benchmark for the valuation as well as a provider of context in the final appraisal report. Traditionally, these comps are selected manually by an appraiser based on recent transactions within a ten mile radius. This manual selection is biased by the appraiser's market knowledge and the amount of transactions in the area. To replace this process, we developed an automated comparable selection service that does so based on objective characteristics, without restricting itself to a small spatial and/or temporal slice of the market.

Comparable selection does not have an objective ground truth, which complicates or even prohibits the use of many machine learning algorithms that could otherwise have been used. Additionally, the outcome of the service needs to be explainable to its users — it cannot be completely opaque. Finally, our service needed to integrate with a streaming data platform, with incremental new data that arrives continuously and needs to be incorporated into the service's model and output.

Our research phase focused on three aspects: determining the possible algorithms for selecting comparable properties given the constraints of explainability and a streaming environment, how to explain the output of the chosen algorithm to the user, and how to build a service around the chosen model that consumes a stream of input data and can generate a set of comparable properties ad-hoc.

Our process was based on agile methodology, with two-week sprints in which we gradually expanded our service into a fully functional proof of concept. Challenges were encountered while developing the logic to incrementally construct a database of real estate properties from the stream of data. These were resolved by switching from a document store to an RDF database, which better matched the flow of data coming in.

The final product consists of several microservices, each of which handles part of the problem domain and can be scaled out independently. A REST API and a web front-end are accessible to its users. The system was tested using both unit tests and end-to-end testing, whereas the model was refined by scoring output on closeness of features indicative of similarity.

As a future improvement, the current model used by the service is fairly simple and can most likely be improved upon once more data is available. Additionally, due to the lack of a ground truth it will be important to tune both comparable selection and explanation metrics in response to user testing.