Skip to content

Commit 210c206

Browse files
committed
Commit
1 parent 6ed4e53 commit 210c206

File tree

1 file changed

+119
-49
lines changed

1 file changed

+119
-49
lines changed

README.md

Lines changed: 119 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,135 @@
1-
# Vehicle Repairs Data Analysis
1+
# 🚗 Data Analysis Using Python: Vehicle Repairs EDA
22

3-
## 📌 Overview
4-
This repository contains an Exploratory Data Analysis (EDA) of a **vehicle repairs dataset**. The dataset consists of **100 records and 52 columns**, capturing detailed information about vehicle repair transactions. The goal is to uncover trends, frequent issues, and potential improvements based on the data.
3+
![Vehicle Repairs Analysis](https://img.shields.io/badge/Vehicle%20Repairs%20Analysis-EDA-blue)
54

6-
---
5+
Welcome to the **Data Analysis Using Python** repository! This project focuses on exploratory data analysis (EDA) of a vehicle repairs dataset. It uncovers patterns in repair types, costs, and vehicle platforms. This repository provides a comprehensive approach to data cleaning, insights extraction, and tag generation from free-text fields.
76

8-
## 📂 Dataset Description
9-
10-
- **File Name**: `vehicle_repairs.csv`
11-
- **Records**: 100
12-
- **Columns**: 52
13-
14-
### 🔑 Key Columns
15-
- `VIN`: Vehicle Identification Number
16-
- `TRANSACTION_ID`: Unique repair transaction ID
17-
- `CORRECTION_VERBATIM`: Description of the repair done
18-
- `CUSTOMER_VERBATIM`: Customer's issue description
19-
- `REPAIR_DATE`: Date of repair
20-
- `CAUSAL_PART_NM`: Part responsible for the issue
21-
- `GLOBAL_LABOR_CODE_DESCRIPTION`: Type of repair performed
22-
- `PLATFORM`: Vehicle platform (e.g., Full-Size Trucks, BEV)
23-
- `BODY_STYLE`: Body style (e.g., Crew Cab, 4 Door Utility)
24-
- `REPORTING_COST`, `TOTALCOST`, `LBRCOST`: Cost-related metrics
25-
- _...and many more_
7+
## Table of Contents
268

27-
---
9+
- [Project Overview](#project-overview)
10+
- [Features](#features)
11+
- [Technologies Used](#technologies-used)
12+
- [Getting Started](#getting-started)
13+
- [Usage](#usage)
14+
- [Data Cleaning Process](#data-cleaning-process)
15+
- [Insights Extraction](#insights-extraction)
16+
- [Tag Generation](#tag-generation)
17+
- [Visualizations](#visualizations)
18+
- [Release Information](#release-information)
19+
- [Contributing](#contributing)
20+
- [License](#license)
2821

29-
## 📊 Analysis Highlights
22+
## Project Overview
3023

31-
### 🔍 Data Exploration
32-
- Examined dataset shape, data types, and missing values
33-
- Generated descriptive statistics for numerical and categorical columns
34-
- Identified unique values and frequent patterns
24+
This project provides an in-depth analysis of vehicle repairs. By examining the dataset, we aim to identify trends and insights that can inform better decision-making in vehicle maintenance and repair services. The analysis includes various aspects such as:
3525

36-
### 🧹 Data Cleaning
37-
- Replaced missing categorical values with `"Unknown"`
38-
- Substituted corrupted characters with `"Corrupt Value"`
39-
- Filled missing `TOTALCOST` values using `REPORTING_COST` for consistency
26+
- **Repair Types**: Understanding the most common types of repairs.
27+
- **Costs**: Analyzing the cost distribution across different repairs.
28+
- **Vehicle Platforms**: Identifying which vehicle platforms incur higher repair costs.
4029

41-
### 💡 Key Insights
42-
- **Common Repairs**: Steering wheel-related issues were most frequent
43-
- **Platform Trends**: Full-Size Trucks had the highest repair count
44-
- **Cost Distribution**: Repair costs varied significantly; a few were high-cost outliers
45-
- **Geographical Patterns**: Most repairs occurred in the US, especially in CA and TX
30+
## Features
4631

47-
---
48-
## 💾 Output Files
32+
- Comprehensive exploratory data analysis (EDA)
33+
- Data cleaning to ensure data quality
34+
- Insights extraction for actionable outcomes
35+
- Tag generation from free-text fields
36+
- Visualizations to present findings clearly
37+
- Saving of cleaned datasets for further analysis
4938

50-
- `cleaned_steering_repair_data.csv`: Final cleaned version of the dataset after preprocessing
51-
- `generated_repair_tags.csv`: Extracted tags from `CORRECTION_VERBATIM` and `CUSTOMER_VERBATIM` columns for downstream use
39+
## Technologies Used
5240

53-
## 📁 Files
41+
This project leverages several powerful libraries and tools:
5442

55-
- `eda.ipynb`: Jupyter notebook containing all EDA steps—loading, cleaning, exploration, and visualization
56-
- `vehicle_repairs.csv`: The original dataset in csv format
57-
- `cleaned_vehicle_repairs.csv`: Cleaned dataset with consistent and processed values
58-
- `generated_tags.csv`: Tags generated from free-text fields (correction/customer verbatim)
43+
- **Python**: The main programming language used for analysis.
44+
- **Pandas**: For data manipulation and analysis.
45+
- **NumPy**: For numerical operations.
46+
- **Matplotlib**: For creating static visualizations.
47+
- **Seaborn**: For statistical data visualization.
48+
- **Jupyter Notebook**: For an interactive coding environment.
49+
- **Counter**: For counting hashable objects.
5950

60-
---
51+
## Getting Started
52+
53+
To get started with this project, clone the repository to your local machine. Use the following command:
54+
55+
```bash
56+
git clone https://github.com/CyberTokyo112/data-analysis-using-python.git
57+
```
58+
59+
Navigate to the project directory:
60+
61+
```bash
62+
cd data-analysis-using-python
63+
```
64+
65+
Install the required libraries:
66+
67+
```bash
68+
pip install -r requirements.txt
69+
```
70+
71+
## Usage
72+
73+
To run the analysis, open the Jupyter Notebook file:
74+
75+
```bash
76+
jupyter notebook vehicle_repairs_analysis.ipynb
77+
```
78+
79+
Follow the instructions in the notebook to perform the analysis step by step.
80+
81+
## Data Cleaning Process
82+
83+
Data cleaning is crucial for ensuring the quality of analysis. In this project, we perform the following steps:
84+
85+
1. **Handling Missing Values**: Identify and address missing data points.
86+
2. **Removing Duplicates**: Ensure unique entries in the dataset.
87+
3. **Standardizing Formats**: Normalize formats for dates, text, and numerical values.
88+
4. **Outlier Detection**: Identify and handle outliers that may skew results.
6189

62-
## 🧠 Notes
63-
This analysis can support further research on repair cost optimization, predictive maintenance, and customer experience improvements.
90+
## Insights Extraction
91+
92+
After cleaning the data, we extract insights to understand trends. Key insights include:
93+
94+
- Most common repair types and their frequencies.
95+
- Average costs associated with different repair types.
96+
- Trends over time in repair requests.
97+
98+
These insights can guide businesses in making informed decisions.
99+
100+
## Tag Generation
101+
102+
Generating tags from free-text fields helps in categorizing data. This project uses simple string processing techniques to create meaningful tags. For example, repairs described as "engine failure" may be tagged as "engine" and "failure."
103+
104+
## Visualizations
105+
106+
Visualizations play a vital role in presenting findings. The project includes various charts and graphs to illustrate insights, such as:
107+
108+
- Bar charts showing the frequency of repair types.
109+
- Box plots displaying cost distributions.
110+
- Line graphs illustrating trends over time.
111+
112+
These visualizations make it easier to understand complex data at a glance.
113+
114+
## Release Information
115+
116+
For the latest releases, please visit the [Releases section](https://github.com/CyberTokyo112/data-analysis-using-python/releases). You can download and execute the files available there.
117+
118+
## Contributing
119+
120+
We welcome contributions to improve this project. If you have suggestions or improvements, please follow these steps:
121+
122+
1. Fork the repository.
123+
2. Create a new branch (`git checkout -b feature-branch`).
124+
3. Make your changes.
125+
4. Commit your changes (`git commit -m 'Add new feature'`).
126+
5. Push to the branch (`git push origin feature-branch`).
127+
6. Open a pull request.
128+
129+
## License
130+
131+
This project is licensed under the MIT License. See the LICENSE file for details.
64132

65133
---
134+
135+
Thank you for checking out the **Data Analysis Using Python** repository! We hope this project helps you gain insights into vehicle repairs and enhances your data analysis skills. For more details, visit the [Releases section](https://github.com/CyberTokyo112/data-analysis-using-python/releases).

0 commit comments

Comments
 (0)