Skip to content

Commit 6ed4e53

Browse files
authored
Create README.md
1 parent b0838a6 commit 6ed4e53

File tree

1 file changed

+65
-0
lines changed

1 file changed

+65
-0
lines changed

README.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Vehicle Repairs Data Analysis
2+
3+
## 📌 Overview
4+
This repository contains an Exploratory Data Analysis (EDA) of a **vehicle repairs dataset**. The dataset consists of **100 records and 52 columns**, capturing detailed information about vehicle repair transactions. The goal is to uncover trends, frequent issues, and potential improvements based on the data.
5+
6+
---
7+
8+
## 📂 Dataset Description
9+
10+
- **File Name**: `vehicle_repairs.csv`
11+
- **Records**: 100
12+
- **Columns**: 52
13+
14+
### 🔑 Key Columns
15+
- `VIN`: Vehicle Identification Number
16+
- `TRANSACTION_ID`: Unique repair transaction ID
17+
- `CORRECTION_VERBATIM`: Description of the repair done
18+
- `CUSTOMER_VERBATIM`: Customer's issue description
19+
- `REPAIR_DATE`: Date of repair
20+
- `CAUSAL_PART_NM`: Part responsible for the issue
21+
- `GLOBAL_LABOR_CODE_DESCRIPTION`: Type of repair performed
22+
- `PLATFORM`: Vehicle platform (e.g., Full-Size Trucks, BEV)
23+
- `BODY_STYLE`: Body style (e.g., Crew Cab, 4 Door Utility)
24+
- `REPORTING_COST`, `TOTALCOST`, `LBRCOST`: Cost-related metrics
25+
- _...and many more_
26+
27+
---
28+
29+
## 📊 Analysis Highlights
30+
31+
### 🔍 Data Exploration
32+
- Examined dataset shape, data types, and missing values
33+
- Generated descriptive statistics for numerical and categorical columns
34+
- Identified unique values and frequent patterns
35+
36+
### 🧹 Data Cleaning
37+
- Replaced missing categorical values with `"Unknown"`
38+
- Substituted corrupted characters with `"Corrupt Value"`
39+
- Filled missing `TOTALCOST` values using `REPORTING_COST` for consistency
40+
41+
### 💡 Key Insights
42+
- **Common Repairs**: Steering wheel-related issues were most frequent
43+
- **Platform Trends**: Full-Size Trucks had the highest repair count
44+
- **Cost Distribution**: Repair costs varied significantly; a few were high-cost outliers
45+
- **Geographical Patterns**: Most repairs occurred in the US, especially in CA and TX
46+
47+
---
48+
## 💾 Output Files
49+
50+
- `cleaned_steering_repair_data.csv`: Final cleaned version of the dataset after preprocessing
51+
- `generated_repair_tags.csv`: Extracted tags from `CORRECTION_VERBATIM` and `CUSTOMER_VERBATIM` columns for downstream use
52+
53+
## 📁 Files
54+
55+
- `eda.ipynb`: Jupyter notebook containing all EDA steps—loading, cleaning, exploration, and visualization
56+
- `vehicle_repairs.csv`: The original dataset in csv format
57+
- `cleaned_vehicle_repairs.csv`: Cleaned dataset with consistent and processed values
58+
- `generated_tags.csv`: Tags generated from free-text fields (correction/customer verbatim)
59+
60+
---
61+
62+
## 🧠 Notes
63+
This analysis can support further research on repair cost optimization, predictive maintenance, and customer experience improvements.
64+
65+
---

0 commit comments

Comments
 (0)