Sentiment Analysis of Website Reviews

This project focuses on performing sentiment analysis on a dataset of website reviews. The goal is to classify the reviews into positive or negative sentiments using various machine learning models such as Logistic Regression, Random Forest, Support Vector Classifier (SVC), and XGBoost. The models are evaluated and tuned using performance metrics such as accuracy, classification reports, and confusion matrices.

Project Overview

In this project, we perform sentiment analysis on a collection of website reviews. The dataset contains the website name, review text, and a sentiment label indicating whether the sentiment of the review is positive or negative. The text data is preprocessed to remove stopwords, punctuation, and apply stemming. Several machine learning models are used to predict sentiment, and hyperparameter tuning is performed to improve model performance.

Key Steps in the Project:

Data Loading: We load training and testing datasets from CSV files.
Data Preprocessing: Text data is cleaned and transformed to remove unnecessary words and punctuation.
Feature Representation: We use techniques like Bag of Words (BoW) and TF-IDF to represent the text data numerically.
Model Training: Various classifiers such as Logistic Regression, Random Forest, SVC, and XGBoost are trained and evaluated.
Model Evaluation: Models are evaluated using accuracy, classification report, and confusion matrix.

Technologies Used

Python (for implementation)
pandas (for data manipulation)
NumPy (for numerical operations)
scikit-learn (for machine learning models, feature extraction, and evaluation)
XGBoost (for gradient boosting model)
nltk (for natural language processing tasks)
matplotlib (for data visualization)
seaborn (for generating confusion matrix heatmaps)

Dataset

The dataset used in this project contains website reviews with the following columns:

website_name: Name of the website being reviewed.
text: Review text.
is_positive_sentiment: Sentiment label (0 for negative, 1 for positive).

The dataset is divided into training and testing sets: x_train.csv, y_train.csv, x_test.csv, and y_test.csv.

Installation and Setup

To run this project locally, follow these steps:

Clone the repository:

git clone https://github.com/yourusername/sentiment-analysis.git

Install required dependencies
```
pip install -r requirements.txt
```
Ensure that the dataset files ( x_train.csv, y_train.csv, x_test.csv, y_test.csv) are placed in the Dataset directory.

Data Preprocessing

The following steps are applied during data preprocessing:

Text data is converted to lowercase.
Punctuation is removed.
Stop words are filtered out.
Stemming is applied using the PorterStemmer from NLTK.

Models

The following models are used for sentiment classification:

Logistic Regression: A simple linear model for binary classification.
Random Forest: An ensemble model that combines multiple decision trees.
Support Vector Classifier (SVC): A model that finds the hyperplane that best separates the data into different classes.
XGBoost: A gradient boosting algorithm for optimized performance.

Hyperparameter Tuning

Logistic Regression: Parameters tuned include regularization strength (C), penalty type (l1 or l2), and solver type.
Random Forest: Parameters tuned include number of estimators, maximum depth, and minimum samples for splitting and leaf nodes.
SVC: Parameters tuned include regularization strength (C), kernel function, and kernel coefficient (gamma).
XGBoost: Parameters tuned include the number of estimators, maximum depth of trees, learning rate, and subsampling ratio.

Model Evaluation

The models are evaluated using the following metrics:

Accuracy: The proportion of correct predictions.
Classification Report: Precision, recall, and F1-score for each class.
Confusion Matrix: A matrix showing the true positives, false positives, true negatives, and false negatives.

Results

Each model is evaluated on the test set, and the evaluation results (accuracy, classification report, and confusion matrix) are printed for comparison.

Contributions

Contributions are welcome! Feel free to fork the project, open issues, or submit pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataset		dataset
.gitignore		.gitignore
README.md		README.md
website_reviews.ipynb		website_reviews.ipynb
website_reviews.py		website_reviews.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment Analysis of Website Reviews

Table of Contents

Project Overview

Key Steps in the Project:

Technologies Used

Dataset

Installation and Setup

Data Preprocessing

Models

Hyperparameter Tuning

Model Evaluation

Results

Contributions

License

About

Uh oh!

Releases

Packages

Languages

ryancodingg/Sentiment-analysis-of-website-reviews

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of Website Reviews

Table of Contents

Project Overview

Key Steps in the Project:

Technologies Used

Dataset

Installation and Setup

Data Preprocessing

Models

Hyperparameter Tuning

Model Evaluation

Results

Contributions

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages