Use a dataset containing medical data of patients to predict if a person has diabetes or not.
The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset
- This project is a part of my machine learning virtual internship at TechnoHacks Edutech
The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.
Column | Description |
---|---|
Pregnancies | Number of times pregnant |
Glucose | Plasma glucose concentration a 2 hours in an oral glucose tolerance test |
Blood Pressure | Diastolic blood pressure (mm Hg) |
Skin Thickness | Triceps skin fold thickness (mm) |
Insulin | 2-Hour serum insulin (mu U/ml) |
BMI | Body mass index (weight in kg/(height in m)^2) |
Diabetes Pedigree Function | A function that scores the probability of diabetes based on family history |
Age | Age (years) |
Outcome | Class variable (0 or 1) 268 of 768 are 1, the others are 0 |
- Logistic Regression
- K Neighbors Classifier
- Random Forest Classifier
- Support Vector Classification
- Decision Tree Classifier
Best Model Accuracy : 75.32