Skip to content

A deep learning and computer vision-based system to detect similar or duplicate images using ResNet features, PCA, and FAISS for fast similarity search. Ideal for visual search, clustering, and moderation.

Notifications You must be signed in to change notification settings

mdhasnainali/Image-Similarity-Detection

Repository files navigation

🖼️ Image Similarity Detection

Advanced Deep Learning & Computer Vision Based Similarity Analysis

Python PyTorch FAISS scikit-learn Matplotlib


🌟 Overview

This project delivers an advanced system for detecting similar or duplicate images using cutting-edge deep learning and computer vision. It's built to efficiently handle large datasets, making it ideal for applications like content moderation, copyright protection, image clustering, and enhancing visual search capabilities. We tackle the challenge of identifying visually similar images, even those with minor alterations, using robust feature extraction and rapid search techniques.

🎯 Core Features

  • 🧠 Deep Feature Extraction: Leverages pre-trained models (like ResNet) to generate powerful feature vectors that capture the essence of images, resilient to changes in lighting or orientation.
  • Efficient Dimensionality Reduction: Employs techniques like PCA (Principal Component Analysis) to reduce the complexity of high-dimensional features, speeding up computations while retaining key information.
  • Blazing-Fast Similarity Search: Integrates FAISS (Facebook AI Similarity Search) for highly optimized indexing and querying, enabling rapid identification of similar images even in massive collections.
  • 📊 Insightful Visualization: Utilizes tools like t-SNE to map the high-dimensional feature space into 2D, providing visual insights into how images cluster based on similarity.

🛠 Technical Approach

  • Feature Extraction: Using PyTorch with pre-trained CNNs (e.g., ResNet) to create dense vector representations of images.
  • Dimensionality Reduction: Applying scikit-learn's PCA to streamline feature vectors.
  • Indexing & Search: Building efficient search indices with FAISS.
  • Visualization: Employing t-SNE (via scikit-learn) and Matplotlib for exploring feature distributions.

Technology Stack:

  • 🐍 Python 3.6+
  • 🔥 PyTorch
  • 🚀 FAISS
  • ⚙️ scikit-learn
  • 🎨 Matplotlib
  • 🖼️ Pillow
  • 💡 CUDA (Optional, for GPU acceleration)

📁 System Architecture

project-root/
├── data/                       # Directory for image datasets
├── extract_features.ipynb      # Notebook for feature extraction
├── image_similarity.ipynb      # Notebook for similarity search experiments
├── visualize_similarity.ipynb  # Notebook for visual analysis and clustering
├── pickle/                     # Directory for serialized features and metadata
│   ├── filenames-*.pickle
│   └── features-*.pickle
├── models/                     # Pre-trained deep learning models (e.g., ResNet)
├── utils/                      # Utility scripts and helper functions
├── requirements.txt            # List of dependencies
└── README.md                   # Project documentation

🚀 Quick Start

Prerequisites

  • Python 3.6 or higher
  • PyTorch
  • FAISS (CPU or GPU version)
  • scikit-learn
  • Matplotlib
  • Pillow

Hardware

  • 💻 CPU: Runs on standard CPUs (slower).
  • ⚡ GPU: CUDA-compatible GPU highly recommended for significant speedup.

Setup

# Clone the repository (if applicable)
# git clone [email protected]:mdhasnainali/Image-Similarity-Detection.git
# cd Image-Similarity-Detection

# Install dependencies
pip install -r requirements.txt

Usage

📂 Organize your images in the data/ directory. ⚙️ Run extract_features.ipynb to process images and save features/filenames to the pickle/ directory. 🔍 Use image_similarity.ipynb to input a query image and find its most similar matches. 📊 Explore feature clusters using visualize_similarity.ipynb.

📈 Performance Evaluation

  • ✅ Fast query response times.
  • 🎯 High precision in identifying similar and duplicate images.
  • 🎨 Effective visualization of image clusters in the feature space.

🌱 Future Improvements

  • 🔗 Integration with real-time image ingestion pipelines.
  • 🧩 Support for alternative feature extraction models (e.g., VGG, EfficientNet).
  • ✨ Enhanced interactive visualization tools.
  • 📱 Potential mobile application integration.

🤝 Contributing

Contributions make the open-source community amazing! Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

Distributed under the MIT License. See LICENSE file for more information.

🙏 Acknowledgements

  • FAISS developers for their efficient similarity search library.
  • The PyTorch team for the flexible deep learning framework.
  • Caltech101 dataset (used during development/testing).

About

A deep learning and computer vision-based system to detect similar or duplicate images using ResNet features, PCA, and FAISS for fast similarity search. Ideal for visual search, clustering, and moderation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published