SmartEditor-Interactive-OCR-with-Confidence-Filtering

SmartEditor is a Python-based OCR tool that combines traditional OCR with real-time user correction. It uses Tesseract OCR, confidence score filtering, and a Tkinter GUI to extract and edit text from scanned documents and noisy image files—reducing manual post-processing.

🧠 Key Features

📷 Image preprocessing: grayscale + Gaussian blur
🔍 Confidence-based word filtering (Tesseract)
🖼 Real-time OCR output with GUI editing (Tkinter)
💾 Saves user-edited text in .txt format
🎯 Tested on scanned documents, signage, forms, and handwritten images

🚀 How It Works

Upload a scanned image or photo-based document.
Preprocessing enhances text visibility (grayscale, denoise, threshold).
OCR & Filtering:
- Tesseract extracts words and confidence scores.
- Only words with confidence ≥ 60 are passed to the GUI.
GUI Interaction:
- Preview annotated image with bounding boxes.
- Edit extracted text live.
Export final corrected text to edited_text.txt.

🛠 Requirements

Python 3.10+
Tesseract OCR (Install & update path in script)
Libraries:
- pytesseract
- opencv
- python
- numpy
- tkinter
- Pillow

⚙️ Installation & Usage

Install Tesseract Download: https://github.com/tesseract-ocr/tesseract Set path in script:

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

Install Python dependencies

pip install opencv-python numpy pytesseract Pillow

Run the application

python SmartEditor-Interactive-OCR-with-Confidence-Filtering.py

📊 Experimental Results

Model	Word Accuracy	Character Accuracy
Default Tesseract	64.8%	77.8%
Convolutional Preprocess	—	61.6%
Super-resolution + Tess.	86.0%	89.7%
EasyOCR + Post-process	85–93%	—
SmartEditor (Proposed)	75%	73.0%

SmartEditor balances accuracy with user control, significantly reducing the burden of manual proofreading.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
SmartEditor-Interactive-OCR-with-Confidence-Filtering.docx		SmartEditor-Interactive-OCR-with-Confidence-Filtering.docx
SmartEditor-Interactive-OCR-with-Confidence-Filtering.py		SmartEditor-Interactive-OCR-with-Confidence-Filtering.py
annotated_document.jpg		annotated_document.jpg
edited_text.txt		edited_text.txt
image1.png		image1.png
image1_gt.txt		image1_gt.txt
image2.png		image2.png
image2_gt.txt		image2_gt.txt
image3.png		image3.png
image3_gt.txt		image3_gt.txt
image4.png		image4.png
image4_gt.txt		image4_gt.txt
image5.png		image5.png
image5_gt.txt		image5_gt.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SmartEditor-Interactive-OCR-with-Confidence-Filtering

🧠 Key Features

🚀 How It Works

🛠 Requirements

⚙️ Installation & Usage

📊 Experimental Results

About

Uh oh!

Releases

Packages

Languages

MS134340/SmartEditor-Interactive-OCR-with-Confidence-Filtering

Folders and files

Latest commit

History

Repository files navigation

SmartEditor-Interactive-OCR-with-Confidence-Filtering

🧠 Key Features

🚀 How It Works

🛠 Requirements

⚙️ Installation & Usage

📊 Experimental Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages