Predicting Which Dublin Properties Will Sell

A machine learning analysis of 13,320 listings: what actually drives a purchase?

PythonPandasscikit-learnMachine LearningData CleaningMatplotlib

View Code

Year

2024

Context

Individual project · Predictive Analytics module · University College Cork

My role

Everything: solo project, cleaning to final models.

Links

Code available on GitHub

The Problem

Property developers face an expensive guessing game: which homes will actually sell? I took a dataset of 13,320 property listings and built models to predict purchase likelihood, and to identify which factors matter most.

The Approach

1
Cleaned 13,320 messy records
Converted text like "2 BED" into numbers, averaged size ranges like "400–500", standardised mixed units into square feet, and filled missing values sensibly (median for numbers, most-common for categories).
2
Handled extreme values with the IQR method
Capping unrealistic prices and sizes instead of deleting them, so no data was thrown away.
3
Trained and tuned four models
Logistic Regression, Random Forest, KNN, and Decision Tree with GridSearchCV and cross-validation, then compared them honestly on accuracy, precision and recall, not just the headline number.
4
Found the uncomfortable truth in the results
All models were good at spotting properties that won't sell, but weak at the rarer "will sell" cases, a class-imbalance problem I documented openly, with fixes proposed (resampling, richer features like proximity to amenities).

The Evidence

Four models, before and after tuning: Decision Tree jumped from 63% to 75%.

Extreme prices capped, not deleted; the data keeps its shape.

No single factor dominates; purchases come from a mix of signals.

The Outcome

property records cleaned and analysed

models compared and tuned

best accuracy achieved

Next project

Can You Forecast a Currency?

The Problem

The Approach

Cleaned 13,320 messy records

Handled extreme values with the IQR method

Trained and tuned four models

Found the uncomfortable truth in the results

The Evidence

The Outcome