Medical Condition Classifier

Overview

The Medical Condition Classifier is a Natural Language Processing (NLP) project designed to predict a patient's medical condition based on their description and recommend top-rated drugs for the identified condition. The application uses machine learning to classify conditions from text input and leverages a dataset of drug reviews to suggest effective medications. The project includes a Streamlit web interface for user interaction, making it accessible for users to input descriptions and receive predictions.

Dataset

The project utilizes the Drugs.com dataset (drugsComTrain_raw.csv), which contains:

Columns: uniqueID, drugName, condition, review, rating, date, usefulCount.
Size: 161,297 entries with 884 unique conditions and 3,436 unique drugs.
Key Usage:
- The review,condition column is used to train the condition prediction model.
- The drugName, rating, and usefulCount columns are used to recommend top drugs for predicted conditions.

Methodology

Data Preprocessing:
- Text cleaning: Converts text to lowercase, removes non-alphabetic characters, and splits into words.
- Stopword removal: Eliminates common English stopwords using NLTK.
- Lemmatization: Reduces words to their base form using NLTK's WordNetLemmatizer.
- Vectorization: Transforms text into numerical features using TF-IDF Vectorizer.
Model Training:
- Algorithm: Passive Aggressive Classifier (chosen for its efficiency with text data).
- Feature Extraction: TF-IDF Vectorizer converts preprocessed text into sparse matrices.
- Training Data: Split from the Drugs.com dataset, with reviews as input and conditions as labels.
Drug Recommendation:
- Filters drugs with ratings ≥9 and useful counts ≥100.
- Sorts by rating and useful count, then selects the top three unique drugs for the predicted condition.
Web Application:
- Built with Streamlit for a user-friendly interface.
- Users input a condition description, and the app displays the predicted condition and recommended drugs.

Installation

To run this project locally, follow these steps:

Clone the Repository:

git clone https://github.com/your-username/medical-condition-classifier.git
cd medical-condition-classifier

Install Dependencies: Ensure Python 3.8+ is installed, then install required packages:
```
pip install -r requirements.txt
```
The requirements.txt should include:
```
pandas
numpy
nltk
scikit-learn
streamlit
joblib
matplotlib
seaborn
wordcloud
```
Download NLTK Data: Run the following in a Python shell:
```
import nltk
nltk.download('popular')
```
Download the Dataset:
- Place the drugsComTrain_raw.csv file in the dataset folder.
- Alternatively, download it from Kaggle.
Run the Application: Start the Streamlit app:
```
streamlit run app.py
```
Open the provided local URL (e.g., http://localhost:8501) in a browser.

Usage

Launch the App: Run streamlit run app.py and access the web interface.
Input Description: Enter a patient condition description in the text area (e.g., "I have a headache and fever").
Predict: Click the "Predict Condition" button to view the predicted condition and top drug recommendations.
View Results:
- The predicted condition is displayed prominently.
- Up to three top-rated drugs are listed.

Files

app.py: Main Streamlit application script for the web interface.
Medical Condition Classifier.ipynb: Jupyter Notebook with data exploration, preprocessing, model training, and evaluation.
tfidf_vectorizer.pkl: Saved TF-IDF Vectorizer model.
model.pkl: Saved Passive Aggressive Classifier model.
dataset/drugsComTrain_raw.csv: Dataset file (not included in the repository; must be downloaded).
requirements.txt: List of Python dependencies.

Future Improvements

Enhanced Model: Experiment with advanced NLP models like BERT or LSTM for better accuracy.
Broader Dataset: Incorporate additional datasets to cover more conditions and drugs.
Real-Time Updates: Integrate APIs for up-to-date drug information.
User Feedback: Add a feedback mechanism to refine predictions based on user input.
Multilingual Support: Extend preprocessing to handle non-English descriptions.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Condition Classifier

Overview

Dataset

Methodology

Installation

Usage

Files

Future Improvements

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataset		dataset
.gitignore		.gitignore
LICENSE		LICENSE
Medical Condition Classifier.ipynb.ipynb		Medical Condition Classifier.ipynb.ipynb
README.md		README.md
app.py		app.py
model.pkl		model.pkl
requirements.txt		requirements.txt
tfidf_vectorizer.pkl		tfidf_vectorizer.pkl

License

JavithNaseem-J/Patient-Condition-Prediction-NLP

Folders and files

Latest commit

History

Repository files navigation

Medical Condition Classifier

Overview

Dataset

Methodology

Installation

Usage

Files

Future Improvements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages