To demonstrate a complete machine learning pipeline from data analysis and feature engineering to model training, evaluation, deploment and interpretation for predicting house sale prices with a real-world dataset.
In the real estate market, pricing a house accurately is critical. Overpricing leads to delays in sales; underpricing leads to revenue loss.
This project creates a model that predicts home sale prices and provides practical suggestions to homeowners on how they can increase their property’s value.
- Build an accurate regularized polynomial regression model for house price prediction.
- Minimize model bias and overfitting using Ridge techniques.
- Translate model insights into actionable advice for homeowners about upgrades and sales timing.
- Jupyter Notebook:
notebooks/Housing_Price_Modeling.ipynb
- Clean dataset and feature engineering pipeline
- Visual breakdown of top price-driving features
- Deployable Streamlit Web App (coming soon)
data/
: Raw and processed datasetsnotebooks/
: Full workflow from exploration to modelingsrc/
: Scripts for modeling, preprocessing, and visualizationreports/figures/
: Plots and result imagesREADME.md
,requirements.txt
- RMSE (Test Set): ~$21,000
- R² Score: ~0.91
- Top Predictive Features:
OverallQual
(Overall material and finish quality)GrLivArea
(Above ground living area)GarageCars
(Garage capacity)
These features strongly influence house prices. Homes with high-quality kitchens, garages, and larger living spaces typically receive higher offers.
Homeowners can use these insights to decide:
- Whether to sell now based on property value predictions.
- Which home improvements will result in the highest ROI (return on investment).
Real estate agents and agencies can leverage the model to give better advice to clients and align listing prices with market expectations.
git clone https://github.com/DelphinKdl/home_price_prediction_using_regularized_polynomial_regression.git
cd home_price_prediction_using_regularized_polynomial_regression
pip install -r requirements.txt