The housing market is impacted by a variety of parameters which gives a complexity that is difficult to analyze with traditional statistical approaches due to the large number of interdependent variables that the market data pro-vides. In this study, ML techniques are utilized to provide a deeper under-standing of the Danish housing market based on a dataset of sales cases pro-vided by a leading Danish real estate agency. We propose an extreme gradi-ent boosting model for sales price regression, and we propose using feature importance techniques to provide insight into important parameters in the na-tional housing market. The regression model trained for sales price with grid search cross-validation for parameter optimization achieves an R2 accuracy of 0.84, an MAE of DKK 433,824, and an RMSE of DKK 675,817. Permu-tation-based feature importance defines the most impactful parameters for the sales price regression where the four features with the highest impacts are: 1. GisX (West/East location), 2. GisY (North/South location), 3. building area, 4. construction year. The results for geographical distribution regarding price, building area, and plot area are illustrated with 2D partial dependence plots of geographical distributions to enhance the understanding of market trends. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.