Automated URL redirection mapping tool for website migrations. Uses multiple similarity algorithms to match 404 URLs with the best live counterparts.
- 404 Status Verification: Checks which URLs are truly 404 before processing.
- Data Cleaning: Removes duplicates and unnecessary URL parameters.
- Advanced Matching Algorithms:
- Fuzzy Matching
- Levenshtein Distance
- Jaccard Similarity
- Hamming Distance
- Ratcliff/Obershelp
- Tversky Index
- Spacy NLP
- TF-IDF Vectorization
- Jaro-Winkler Similarity
- BERTopic Clustering
- Scoring System: Aggregates results from multiple algorithms to determine the best redirect.
- Excel Output: Saves the final redirection mapping as an Excel file.
-
Clone the repository:
git clone https://github.com/yourusername/URL-Redirect-Migrator.git cd URL-Redirect-Migrator
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On macOS/Linux venv\Scripts\activate # On Windows
-
Install dependencies:
pip install -r requirements.txt
-
Prepare two Excel files:
- One containing 404 URLs that need redirection.
- One containing Live URLs to which redirections should be mapped.
-
Run the script:
python redirect_mapper.py
-
Follow the prompts to select the appropriate Excel files and sheets.
-
The script will generate an output Excel file containing the best redirection mapping.
The script generates an Excel file with two sheets:
- Mapping: Full algorithm analysis with scores.
- Redirects: Cleaned final redirection list.
- Python 3.8+
- pandas, numpy, httpx, fuzzywuzzy, Levenshtein, scipy, spacy, joblib, scikit-learn, bertopic, jellyfish, difflib
Feel free to fork this project and submit pull requests.
This project is licensed under the MIT License.