I'm Virginia, a data scientist with a background in chemical engineering. Turning data into insights, models, and beautiful dashboards is my thing.
- 🐍 Python enthusiast focused on solving real-world problems through data.
- 🤖 Currently leveling up in machine learning and deep learning. Always exploring new techniques, always building something cool.
- 🎨 Big on data visualization. Passionate about creating visually engaging content.
- 📚 I document everything because good science and code should always be clear and reusable.
Here are the tools and libraries I use regularly to explore data, build models, and tell compelling stories:
-
Languages & Frameworks
Python
·Scikit-learn
·Keras
·TensorFlow
·PyTorch
·NumPy
·Pandas
-
Data Visualization
Matplotlib
·Seaborn
·Power BI
·Dashboards
-
Databases & Querying
SQL
·MySQL
-
Tools & Environment
Jupyter Notebooks
·VS Code
·Git
·GitHub
,TensorBoard
Welcome to my digital lab! Below, you’ll find a collection of my data science projects where I dive into data, build models, and apply algorithms to solve real-world problems.
Diving into data to uncover patterns, trends, and insights that lay the groundwork for further analysis or model building.
- Problem: Limited access to global chemical producers' data for market research.
- Solution: Built a web scraper to gather key data from multiple websites.
- Impact: Provides valuable information for market research, helping companies stay informed about competitors.
- Tools Used:
BeautifulSoup
,requests
,NumPy
,Pandas
,Matplotlib
,Seaborn
- Problem: Lack of clear insights on the effectiveness of different gym exercises for specific fitness goals.
- Solution: Conducted an exploratory data analysis (EDA) on a gym exercise dataset, using data visualization techniques to uncover patterns.
- Impact: Identified the most effective exercises for various fitness goals, providing actionable insights for gym-goers and trainers.
- Tools Used:
NumPy
,Pandas
,Matplotlib
,Seaborn
- Problem: Difficulty in understanding the key factors that influence box-office revenue.
- Solution: Retrieved and analyzed movie data using API calls, followed by exploratory data analysis (EDA) with visualization and statistical techniques to uncover patterns and trends.
- Impact: Provides actionable insights for film production studios to optimize revenue forecasting and make data-driven decisions.
- Tools Used:
urllib.request
,gzip
,json
,NumPy
,Pandas
,Matplotlib
,Seaborn
- Problem: Understanding global CO2 emissions trends to inform environmental policymaking.
- Solution: Conducted exploratory data analysis (EDA) on CO2 emissions data to uncover key trends and patterns.
- Impact: Provides insights that help policymakers identify trends and set data-driven targets for emissions reduction.
- Tools Used:
NumPy
,Pandas
,Matplotlib
,Seaborn
Building and fine-tuning models to solve problems like prediction, classification, and forecasting using real-world data.
- Problem: High energy costs in the steel industry driven by inefficient energy usage.
- Solution: Developed a regression model to predict future energy consumption based on historical data.
- Impact: Provides insights that help reduce energy costs and optimize usage, leading to improved efficiency.
- Tools Used:
NumPy
,Pandas
,Scikit-learn
,Matplotlib
,Seaborn
,pickle
- Problem: Downtime from unexpected machinery failures leads to significant financial losses.
- Solution: Developed a Random Forest classifier to predict machinery breakdowns based on historical data.
- Impact: Helps reduce downtime and maintenance costs by enabling proactive repairs.
- Tools Used:
NumPy
,Pandas
,Scikit-learn
,Matplotlib
,Seaborn
,imblearn
,pickle
- Problem: Manually detecting defects in steel products is time-consuming and prone to errors.
- Solution: Built a classification model using XGBoost to automate defect detection, improving both speed and accuracy.
- Impact: Increases accuracy in defect identification, reduces wastage, and boosts production efficiency.
- Tools Used:
NumPy
,Pandas
,XGBoost
,Scikit-learn
,Matplotlib
,Seaborn
,SMOTE
,pickle
Applying neural networks to tackle complex tasks like image classification, time-series forecasting, and other advanced challenges.
- Problem: Need for accurate classification of images from the CIFAR-10 dataset, which includes 10 diverse categories.
- Solution: Built a Convolutional Neural Network (CNN) using TensorFlow and Keras to classify images, optimizing for accuracy and model performance.
- Impact: Achieved improved image classification accuracy, with potential applications in real-world image recognition tasks like object detection and autonomous systems.
- Tools Used:
TensorFlow
,Keras
,NumPy
,Pandas
,Matplotlib
,Seaborn
- Problem: Need for accurate prediction of air quality to safeguard public health.
- Solution: Developed an LSTM model for time-series forecasting to predict air quality based on historical data.
- Impact: Empowers cities to anticipate air quality trends and take preventive measures, improving public health outcomes.
- Tools Used:
NumPy
,Pandas
,TensorFlow
,Keras
,Matplotlib
,Seaborn
,Statsmodels
,Scikit-learn
Applying text analysis techniques to uncover insights, identify patterns, and understand language data.
- Problem: Difficulty in uncovering underlying topics within large news datasets for content analysis.
- Solution: Applied LDA (Latent Dirichlet Allocation) topic modeling to extract and categorize key topics from news articles.
- Impact: Provides insights for content analysis, media strategy, and topic prediction, improving content personalization and decision-making.
- Tools Used:
Gensim
,NLTK
,Pandas
,Matplotlib
,Seaborn
,pyLDAvis
- Problem: Extracting sentiment and topics from large literary datasets, particularly from F. Scott Fitzgerald’s works.
- Solution: Applied Cardiff RoBERTa for sentiment analysis and BERTopic for topic modeling to analyze sentiment and extract key themes from Fitzgerald’s texts.
- Impact: Provides deeper insights into literary themes and sentiments, enhancing the understanding of F. Scott Fitzgerald’s work and its emotional depth.
- Tools Used:
Cardiff RoBERTa
,BERTopic
,NLTK
,Transformers
,SentenceTransformers
,UMAP
,HDBSCAN
,Gensim
,Matplotlib
,Seaborn
,WordCloud
Designing and optimizing SQL queries to extract, clean, and prepare data for analysis or modeling.
Creating interactive visualizations and dashboards to transform complex data into clear, actionable insights.
Developing Python scripts to automate repetitive tasks and workflows, saving time and boosting efficiency
I’m always exploring new ideas, building, experimenting, and sharing projects that mix code, design, and storytelling.
🎯 Let’s connect! If you're into data, ML, or just cool visualizations, let’s chat!