|
| 1 | +# Vehicle Repairs Data Analysis |
| 2 | + |
| 3 | +## 📌 Overview |
| 4 | +This repository contains an Exploratory Data Analysis (EDA) of a **vehicle repairs dataset**. The dataset consists of **100 records and 52 columns**, capturing detailed information about vehicle repair transactions. The goal is to uncover trends, frequent issues, and potential improvements based on the data. |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +## 📂 Dataset Description |
| 9 | + |
| 10 | +- **File Name**: `vehicle_repairs.csv` |
| 11 | +- **Records**: 100 |
| 12 | +- **Columns**: 52 |
| 13 | + |
| 14 | +### 🔑 Key Columns |
| 15 | +- `VIN`: Vehicle Identification Number |
| 16 | +- `TRANSACTION_ID`: Unique repair transaction ID |
| 17 | +- `CORRECTION_VERBATIM`: Description of the repair done |
| 18 | +- `CUSTOMER_VERBATIM`: Customer's issue description |
| 19 | +- `REPAIR_DATE`: Date of repair |
| 20 | +- `CAUSAL_PART_NM`: Part responsible for the issue |
| 21 | +- `GLOBAL_LABOR_CODE_DESCRIPTION`: Type of repair performed |
| 22 | +- `PLATFORM`: Vehicle platform (e.g., Full-Size Trucks, BEV) |
| 23 | +- `BODY_STYLE`: Body style (e.g., Crew Cab, 4 Door Utility) |
| 24 | +- `REPORTING_COST`, `TOTALCOST`, `LBRCOST`: Cost-related metrics |
| 25 | +- _...and many more_ |
| 26 | + |
| 27 | +--- |
| 28 | + |
| 29 | +## 📊 Analysis Highlights |
| 30 | + |
| 31 | +### 🔍 Data Exploration |
| 32 | +- Examined dataset shape, data types, and missing values |
| 33 | +- Generated descriptive statistics for numerical and categorical columns |
| 34 | +- Identified unique values and frequent patterns |
| 35 | + |
| 36 | +### 🧹 Data Cleaning |
| 37 | +- Replaced missing categorical values with `"Unknown"` |
| 38 | +- Substituted corrupted characters with `"Corrupt Value"` |
| 39 | +- Filled missing `TOTALCOST` values using `REPORTING_COST` for consistency |
| 40 | + |
| 41 | +### 💡 Key Insights |
| 42 | +- **Common Repairs**: Steering wheel-related issues were most frequent |
| 43 | +- **Platform Trends**: Full-Size Trucks had the highest repair count |
| 44 | +- **Cost Distribution**: Repair costs varied significantly; a few were high-cost outliers |
| 45 | +- **Geographical Patterns**: Most repairs occurred in the US, especially in CA and TX |
| 46 | + |
| 47 | +--- |
| 48 | +## 💾 Output Files |
| 49 | + |
| 50 | +- `cleaned_steering_repair_data.csv`: Final cleaned version of the dataset after preprocessing |
| 51 | +- `generated_repair_tags.csv`: Extracted tags from `CORRECTION_VERBATIM` and `CUSTOMER_VERBATIM` columns for downstream use |
| 52 | + |
| 53 | +## 📁 Files |
| 54 | + |
| 55 | +- `eda.ipynb`: Jupyter notebook containing all EDA steps—loading, cleaning, exploration, and visualization |
| 56 | +- `vehicle_repairs.csv`: The original dataset in csv format |
| 57 | +- `cleaned_vehicle_repairs.csv`: Cleaned dataset with consistent and processed values |
| 58 | +- `generated_tags.csv`: Tags generated from free-text fields (correction/customer verbatim) |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +## 🧠 Notes |
| 63 | +This analysis can support further research on repair cost optimization, predictive maintenance, and customer experience improvements. |
| 64 | + |
| 65 | +--- |
0 commit comments