Skip to content

πŸ“Š Student Math Score Prediction – A machine learning project that predicts students' math scores based on demographic and educational attributes. Utilizes Python, pandas, scikit-learn, and data validation with dataclasses to ensure high-quality inputs. Ideal for educational analytics and predictive modeling.

Notifications You must be signed in to change notification settings

sujeetgund/mlproject-udemy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ML Project - Student Performance Predictor

This project aims to predict student performance based on various factors such as Gender, Ethnicity, Parental Level of Education, Lunch, and Test Preparation Course. The goal is to build a robust Machine Learning model using Python to predict student scores.

Table of Contents

Project Overview

This project builds a Student Performance Predictor using Machine Learning techniques. It includes steps such as data ingestion, data transformation, model training, and prediction using the pipelines.

Dataset

Installation

To get started, follow these steps:

  1. Clone the repository:
git clone https://github.com/sujeetgund/mlproject-udemy.git
cd mlproject-udemy
  1. Create a virtual environment and activate it:
python -m venv env
source env/bin/activate # For Linux/macOS
env\Scripts\activate # For Windows
  1. Install the dependencies:
pip install -r requirements.txt
  1. Install the project using:
pip install -e .

After installation, a folder named ml_project_udemy.egg-info will be created.

Project Structure

mlproject-udemy/
β”‚
β”œβ”€β”€ README.md
β”œβ”€β”€ setup.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ logs/
β”‚   └── *.txt # Log files
β”œβ”€β”€ artifacts/
β”‚   β”œβ”€β”€ raw_data.csv
β”‚   β”œβ”€β”€ train.csv
β”‚   β”œβ”€β”€ test.csv
β”‚   β”œβ”€β”€ preprocessor.pkl # Saved preprocessor after transformation
β”‚   β”œβ”€β”€ model.pkl # Trained model file
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ eda.ipynb
β”‚   β”œβ”€β”€ model_training.ipynb
β”‚   └── data/
β”‚       └── stud.csv
β”œβ”€β”€ ml_project_udemy.egg-info/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ logger.py
β”‚   β”œβ”€β”€ exception.py
β”‚   β”œβ”€β”€ utils.py
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ data_ingestion.py
β”‚   β”‚   β”œβ”€β”€ data_transformation.py
β”‚   β”‚   β”œβ”€β”€ model_trainer.py
β”‚   └── pipelines/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ train_pipeline.py
β”‚       β”œβ”€β”€ prediction_pipeline.py
β”œβ”€β”€ streamlit_app.py

Description of Main Modules:

  • logger.py: Handles logging for tracking events, stored in the logs folder.
  • exception.py: Custom exception handling.
  • utils.py: Utility functions for data processing.
  • data_ingestion.py: Handles data loading. After running, the artifacts folder will contain:
    • raw_data.csv: The original dataset.
    • train.csv: Training data split.
    • test.csv: Testing data split.
  • data_transformation.py: Prepares and transforms data for modeling. After running, it generates:
    • preprocessor.pkl: The saved preprocessor object.
    • Transformed train and test data arrays.
  • model_trainer.py: Trains multiple machine learning models, selects the best one based on R2 score, and saves it as model.pkl.
  • train_pipeline.py: End-to-end pipeline for training.
  • prediction_pipeline.py: Pipeline for making predictions. You can modify prediction_pipeline.py to use different student data for predictions.
  • streamlit_app.py: Interactive web app using Streamlit to input custom data and get predictions.
  • notebooks/eda.ipynb: Exploratory Data Analysis notebook.
  • notebooks/model_training.ipynb: Model training and evaluation notebook.
  • notebooks/data/stud.csv: Student performance dataset.

Usage

Running the Full Project

To run the full project, execute:

python src/pipelines/train_pipeline.py

This will handle data ingestion, transformation, and model training.

Running Predictions

To make predictions, execute:

python src/pipelines/prediction_pipeline.py

If you want to predict using different student data, modify the following section inside predict_pipeline.py:

students_data = CustomData(
    records=[
        StudentExamRecord(
            gender="male",
            race_ethnicity="group B",
            parental_level_of_education="some college",
            lunch="standard",
            test_preparation_course="none",
            reading_score=72,
            writing_score=83,
        ),
        StudentExamRecord(
            gender="female",
            race_ethnicity="group C",
            parental_level_of_education="bachelor's degree",
            lunch="free/reduced",
            test_preparation_course="completed",
            reading_score=88,
            writing_score=92,
        ),
    ]
)

Streamlit App

You can launch a user-friendly interface using Streamlit:

streamlit run streamlit_app.py

This app allows you to:

  • Input custom student data through a form.
  • Get predicted math scores using the trained model.
  • Trigger model training from the UI.

Valid Values for Input Fields:

  • Gender: male, female
  • Race/Ethnicity: group A, group B, group C, group D, group E
  • Parental Level of Education: some high school, high school, some college, associate's degree, bachelor's degree, master's degree
  • Lunch: standard, free/reduced
  • Test Preparation Course: none, completed
  • Reading & Writing Scores: Integer values between 0 and 100

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/YourFeature).
  3. Commit your changes (git commit -m 'Add some feature').
  4. Push to your branch (git push origin feature/YourFeature).
  5. Open a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

πŸ“Š Student Math Score Prediction – A machine learning project that predicts students' math scores based on demographic and educational attributes. Utilizes Python, pandas, scikit-learn, and data validation with dataclasses to ensure high-quality inputs. Ideal for educational analytics and predictive modeling.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published