Back

Predicting Which Dublin Properties Will Sell

A machine learning analysis of 13,320 listings: what actually drives a purchase?

PythonPandasscikit-learnMachine LearningData CleaningMatplotlib
Year
2024
Context
Individual project · Predictive Analytics module · University College Cork
My role
Everything: solo project, cleaning to final models.
Links
Code available on GitHub

The Problem

Property developers face an expensive guessing game: which homes will actually sell? I took a dataset of 13,320 property listings and built models to predict purchase likelihood, and to identify which factors matter most.

The Approach

  1. 1

    Cleaned 13,320 messy records

    Converted text like "2 BED" into numbers, averaged size ranges like "400–500", standardised mixed units into square feet, and filled missing values sensibly (median for numbers, most-common for categories).

  2. 2

    Handled extreme values with the IQR method

    Capping unrealistic prices and sizes instead of deleting them, so no data was thrown away.

  3. 3

    Trained and tuned four models

    Logistic Regression, Random Forest, KNN, and Decision Tree with GridSearchCV and cross-validation, then compared them honestly on accuracy, precision and recall, not just the headline number.

  4. 4

    Found the uncomfortable truth in the results

    All models were good at spotting properties that won't sell, but weak at the rarer "will sell" cases, a class-imbalance problem I documented openly, with fixes proposed (resampling, richer features like proximity to amenities).

The Evidence

Four models, before and after tuning: Decision Tree jumped from 63% to 75%.
Four models, before and after tuning: Decision Tree jumped from 63% to 75%.
Extreme prices capped, not deleted; the data keeps its shape.
Extreme prices capped, not deleted; the data keeps its shape.
No single factor dominates; purchases come from a mix of signals.
No single factor dominates; purchases come from a mix of signals.

The Outcome

0

property records cleaned and analysed

0

models compared and tuned

0%

best accuracy achieved

Next project

Can You Forecast a Currency?