This project aims to predict student performance based on various factors such as Gender, Ethnicity, Parental Level of Education, Lunch, and Test Preparation Course. The goal is to build a robust Machine Learning model using Python to predict student scores.
This project builds a Student Performance Predictor using Machine Learning techniques. It includes steps such as data ingestion, data transformation, model training, and prediction using the pipelines.
- Source: Kaggle - Student Performance Dataset
- Size: 8 columns, 1000 rows
- Features: Gender, Ethnicity, Parental Level of Education, Lunch, Test Preparation Course, and Test Scores
To get started, follow these steps:
- Clone the repository:
git clone https://github.com/sujeetgund/mlproject-udemy.git
cd mlproject-udemy
- Create a virtual environment and activate it:
python -m venv env
source env/bin/activate # For Linux/macOS
env\Scripts\activate # For Windows
- Install the dependencies:
pip install -r requirements.txt
- Install the project using:
pip install -e .
After installation, a folder named ml_project_udemy.egg-info
will be created.
mlproject-udemy/
β
βββ README.md
βββ setup.py
βββ requirements.txt
βββ logs/
β βββ *.txt # Log files
βββ artifacts/
β βββ raw_data.csv
β βββ train.csv
β βββ test.csv
β βββ preprocessor.pkl # Saved preprocessor after transformation
β βββ model.pkl # Trained model file
βββ notebooks/
β βββ eda.ipynb
β βββ model_training.ipynb
β βββ data/
β βββ stud.csv
βββ ml_project_udemy.egg-info/
βββ src/
β βββ __init__.py
β βββ logger.py
β βββ exception.py
β βββ utils.py
β βββ components/
β β βββ __init__.py
β β βββ data_ingestion.py
β β βββ data_transformation.py
β β βββ model_trainer.py
β βββ pipelines/
β βββ __init__.py
β βββ train_pipeline.py
β βββ prediction_pipeline.py
βββ streamlit_app.py
logger.py
: Handles logging for tracking events, stored in thelogs
folder.exception.py
: Custom exception handling.utils.py
: Utility functions for data processing.data_ingestion.py
: Handles data loading. After running, theartifacts
folder will contain:raw_data.csv
: The original dataset.train.csv
: Training data split.test.csv
: Testing data split.
data_transformation.py
: Prepares and transforms data for modeling. After running, it generates:preprocessor.pkl
: The saved preprocessor object.- Transformed train and test data arrays.
model_trainer.py
: Trains multiple machine learning models, selects the best one based on R2 score, and saves it asmodel.pkl
.train_pipeline.py
: End-to-end pipeline for training.prediction_pipeline.py
: Pipeline for making predictions. You can modifyprediction_pipeline.py
to use different student data for predictions.streamlit_app.py
: Interactive web app using Streamlit to input custom data and get predictions.notebooks/eda.ipynb
: Exploratory Data Analysis notebook.notebooks/model_training.ipynb
: Model training and evaluation notebook.notebooks/data/stud.csv
: Student performance dataset.
To run the full project, execute:
python src/pipelines/train_pipeline.py
This will handle data ingestion, transformation, and model training.
To make predictions, execute:
python src/pipelines/prediction_pipeline.py
If you want to predict using different student data, modify the following section inside predict_pipeline.py
:
students_data = CustomData(
records=[
StudentExamRecord(
gender="male",
race_ethnicity="group B",
parental_level_of_education="some college",
lunch="standard",
test_preparation_course="none",
reading_score=72,
writing_score=83,
),
StudentExamRecord(
gender="female",
race_ethnicity="group C",
parental_level_of_education="bachelor's degree",
lunch="free/reduced",
test_preparation_course="completed",
reading_score=88,
writing_score=92,
),
]
)
You can launch a user-friendly interface using Streamlit:
streamlit run streamlit_app.py
This app allows you to:
- Input custom student data through a form.
- Get predicted math scores using the trained model.
- Trigger model training from the UI.
- Gender:
male
,female
- Race/Ethnicity:
group A
,group B
,group C
,group D
,group E
- Parental Level of Education:
some high school
,high school
,some college
,associate's degree
,bachelor's degree
,master's degree
- Lunch:
standard
,free/reduced
- Test Preparation Course:
none
,completed
- Reading & Writing Scores: Integer values between 0 and 100
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature
). - Commit your changes (
git commit -m 'Add some feature'
). - Push to your branch (
git push origin feature/YourFeature
). - Open a Pull Request.
This project is licensed under the MIT License. See the LICENSE file for details.