This project focuses on analyzing the performance of electronic components using Python. The tasks include data storage and retrieval, exploratory data analysis (EDA), model selection, data collection, data cleansing, data visualization, data mapping, deviation calculation, and unit testing. Three datasets are provided:
• 📂 Train dataset (used to select ideal functions)
• 📂 Ideal dataset (contains 50 ideal functions)
• 📂 Test dataset (used for validation and mapping)
This analysis helps predict equipment failures by comparing real-world component performance against optimal readings.
The main task is to develop a Python program that:
-
🏆 Selects the four best-fitting functions from 50 available ideal functions using training data.
-
🔗 Uses the test data to map x-y pairs to one of the four ideal functions.
-
📊 Stores the mapping results along with deviation calculations.
The project aims to create a reliable and efficient Python program that:
• 🏆 Selects four ideal functions based on least squares error.
• 🔗 Maps test data to these ideal functions while considering deviation constraints.
• 📈 Evaluates performance using R-squared values and other error metrics.
• 📌 How can we obtain the four best-fit ideal functions using the least squares method?
• 📌 What are the best alternative evaluation metrics for selection?
• 📌 Do alternative metrics yield the same ideal function choices as the least squares method?
• 📌 What are the R-squared values for the selected functions with test data?
• 📌 How does deviation change after mapping test data to ideal functions?
This research is structured into three main sections:
• 📖 Introduction: Overview, problem definition, objectives, research questions.
• 🛠 Investigation Method: EDA, database storage, function selection, and mapping.
• 📌 Conclusion: Summary of results, future scope, and recommendations.
EDA techniques were applied to analyze dataset properties and relationships. Various visualizations were used, including box plots, scatter plots, and correlation matrices.
• 📊 Boxplot of Train Dataset
• 📈 Scatter plot with Regression Line
• 📊 Boxplot of Ideal Dataset
• 📊 Boxplot After Removing Duplicates
SQLite was used for storing datasets, accessed via SQLAlchemy ORM in Python.
🏷 Training Data Table
🏷 Ideal Functions Table
🏷 Test Data Mapping Table
📐 Least Squares Analysis
Bar charts were generated for each function:
• 📊 Least Squares Bar Chart (Y1 Train Data)
• 📊 Least Squares Bar Chart (Y2 Train Data)
• 📊 Least Squares Bar Chart (Y3 Train Data)
• 📊 Least Squares Bar Chart (Y4 Train Data)
• Scatter plot of Training vs. Ideal Functions
• 📊 R-Squared Values for Test Data Mapping
• 📊 Absolute Maximum Deviation Bar Chart
• 📈 Scatter Plot of Mapped Test Data
The project successfully developed a Python program that:
• ✅ Selected the four best-fitting functions using least squares error.
• 🔗 Mapped test data points to these ideal functions.
• 📊 Evaluated the deviation between actual and ideal data.
• 🔎 Investigate alternative evaluation metrics.
• 📈 Improve accuracy using advanced machine learning techniques.
• 🤖 Automate parameter tuning for better function selection.