ML Project - Student Performance Predictor

This project aims to predict student performance based on various factors such as Gender, Ethnicity, Parental Level of Education, Lunch, and Test Preparation Course. The goal is to build a robust Machine Learning model using Python to predict student scores.

Project Overview

This project builds a Student Performance Predictor using Machine Learning techniques. It includes steps such as data ingestion, data transformation, model training, and prediction using the pipelines.

Dataset

Source: Kaggle - Student Performance Dataset
Size: 8 columns, 1000 rows
Features: Gender, Ethnicity, Parental Level of Education, Lunch, Test Preparation Course, and Test Scores

Installation

To get started, follow these steps:

Clone the repository:

git clone https://github.com/sujeetgund/mlproject-udemy.git
cd mlproject-udemy

Create a virtual environment and activate it:

python -m venv env
source env/bin/activate # For Linux/macOS
env\Scripts\activate # For Windows

Install the dependencies:

pip install -r requirements.txt

Install the project using:

pip install -e .

After installation, a folder named ml_project_udemy.egg-info will be created.

Project Structure

mlproject-udemy/
│
├── README.md
├── setup.py
├── requirements.txt
├── logs/
│   └── *.txt # Log files
├── artifacts/
│   ├── raw_data.csv
│   ├── train.csv
│   ├── test.csv
│   ├── preprocessor.pkl # Saved preprocessor after transformation
│   ├── model.pkl # Trained model file
├── notebooks/
│   ├── eda.ipynb
│   ├── model_training.ipynb
│   └── data/
│       └── stud.csv
├── ml_project_udemy.egg-info/
├── src/
│   ├── __init__.py
│   ├── logger.py
│   ├── exception.py
│   ├── utils.py
│   ├── components/
│   │   ├── __init__.py
│   │   ├── data_ingestion.py
│   │   ├── data_transformation.py
│   │   ├── model_trainer.py
│   └── pipelines/
│       ├── __init__.py
│       ├── train_pipeline.py
│       ├── prediction_pipeline.py
├── streamlit_app.py

Description of Main Modules:

logger.py: Handles logging for tracking events, stored in the logs folder.
exception.py: Custom exception handling.
utils.py: Utility functions for data processing.
data_ingestion.py: Handles data loading. After running, the artifacts folder will contain:
- raw_data.csv: The original dataset.
- train.csv: Training data split.
- test.csv: Testing data split.
data_transformation.py: Prepares and transforms data for modeling. After running, it generates:
- preprocessor.pkl: The saved preprocessor object.
- Transformed train and test data arrays.
model_trainer.py: Trains multiple machine learning models, selects the best one based on R2 score, and saves it as model.pkl.
train_pipeline.py: End-to-end pipeline for training.
prediction_pipeline.py: Pipeline for making predictions. You can modify prediction_pipeline.py to use different student data for predictions.
streamlit_app.py: Interactive web app using Streamlit to input custom data and get predictions.
notebooks/eda.ipynb: Exploratory Data Analysis notebook.
notebooks/model_training.ipynb: Model training and evaluation notebook.
notebooks/data/stud.csv: Student performance dataset.

Usage

Running the Full Project

To run the full project, execute:

python src/pipelines/train_pipeline.py

This will handle data ingestion, transformation, and model training.

Running Predictions

To make predictions, execute:

python src/pipelines/prediction_pipeline.py

If you want to predict using different student data, modify the following section inside predict_pipeline.py:

students_data = CustomData(
    records=[
        StudentExamRecord(
            gender="male",
            race_ethnicity="group B",
            parental_level_of_education="some college",
            lunch="standard",
            test_preparation_course="none",
            reading_score=72,
            writing_score=83,
        ),
        StudentExamRecord(
            gender="female",
            race_ethnicity="group C",
            parental_level_of_education="bachelor's degree",
            lunch="free/reduced",
            test_preparation_course="completed",
            reading_score=88,
            writing_score=92,
        ),
    ]
)

Streamlit App

You can launch a user-friendly interface using Streamlit:

streamlit run streamlit_app.py

This app allows you to:

Input custom student data through a form.
Get predicted math scores using the trained model.
Trigger model training from the UI.

Valid Values for Input Fields:

Gender: male, female
Race/Ethnicity: group A, group B, group C, group D, group E
Parental Level of Education: some high school, high school, some college, associate's degree, bachelor's degree, master's degree
Lunch: standard, free/reduced
Test Preparation Course: none, completed
Reading & Writing Scores: Integer values between 0 and 100

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature/YourFeature).
Commit your changes (git commit -m 'Add some feature').
Push to your branch (git push origin feature/YourFeature).
Open a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML Project - Student Performance Predictor

Table of Contents

Project Overview

Dataset

Installation

Project Structure

Description of Main Modules:

Usage

Running the Full Project

Running Predictions

Streamlit App

Valid Values for Input Fields:

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
streamlit_app.py		streamlit_app.py

sujeetgund/mlproject-udemy

Folders and files

Latest commit

History

Repository files navigation

ML Project - Student Performance Predictor

Table of Contents

Project Overview

Dataset

Installation

Project Structure

Description of Main Modules:

Usage

Running the Full Project

Running Predictions

Streamlit App

Valid Values for Input Fields:

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages