The PDF Translation Project is an end-to-end solution for extracting text from PDF documents and translating it into a target language (default: Chinese) using OpenAI’s API. It integrates robust text extraction, a powerful translation service, and a user-friendly web interface. Additionally, the project provides optional capabilities for fine-tuning translation models using state-of-the-art techniques with mT5 and VBLoRA.
- Overview
- Features
- Project Structure
- Installation
- Configuration
- Running the Application
- Usage
- Fine-Tuning with mT5 and VBLoRA
- Troubleshooting
- License
- Acknowledgments
The project provides a comprehensive pipeline to:
- Extract text from PDFs, including Optical Character Recognition (OCR) for embedded images.
- Translate extracted text using an OpenAI-powered service.
- Display translations through an intuitive web interface.
- (Optional) Fine-tune your translation model using mT5 and VBLoRA for enhanced performance.
-
PDF Extraction:
Leverages robust methods for text extraction. For PDF extraction, OpenAI is used to summarize or extract text from images instead of conventional OCR techniques. -
Translation:
Translates the extracted text into the desired language using OpenAI’s API.
Default target language is Chinese. -
Web User Interface:
Provides a simple and responsive interface for PDF uploads and displaying translated content.
Note: When you click on the "Extract & Translate" button, please wait a minute as the system processes the translation until the translated text is shown. -
Model Fine-Tuning (Optional):
Experiment with fine-tuning your translation model using mT5 along with VBLoRA for parameter-efficient training.
Note: Initial experiments were paused due to high computational costs and resource limitations. However, the code is fully runnable if you have sufficient resources, so feel free to try it.
PDF-Translation/
├── app.py # Main FastAPI application exposing API endpoints.
├── crawler/
│ └── pdf_crawler.py # Utilities for PDF crawling and text extraction.
├── llm/
│ ├── openai_translation.py # Translation service utilizing OpenAI's API.
│ └── mt5_translation.py # Script for fine-tuning mT5 using VBLoRA (optional).
├── web/
│ ├── main.html # HTML file for the web interface.
│ └── main.js # JavaScript handling file uploads and API interactions.
│ └── styles.css # CSS script for frontend designing and styling.
└── ReadMe.md # Project documentation.
git clone https://github.com/yourusername/PDF-Translation.git
cd PDF-Translation
python3 -m venv venv
source venv/bin/activate
Ensure that your requirements.txt
file includes packages such as FastAPI, Uvicorn, Mangum, openai, Pillow, Transformers, and Datasets. Then run:
pip install -r requirements.txt
-
OpenAI API Key:
Update theOPENAI_API_KEY
constant in bothapp.py
andllm/openai_translation.py
with your valid OpenAI API key. -
Model Settings:
The default translation model is configured inllm/openai_translation.py
. Modify the settings if a different model is desired.
-
Start the FastAPI Server:
uvicorn app:app --reload
-
API Endpoints:
/extract_text
: Endpoint for PDF text extraction./translate
: Endpoint for text translation.
The API will be available at:
http://127.0.0.1:8000
-
Ensure the backend API is running.
-
Change to the
web
Directory:cd web
-
Serve the Static Files using Python’s HTTP Server:
python3 -m http.server 8000
-
Open your Browser and Navigate to:
http://localhost:8000/main.html
-
PDF Upload:
Use the web interface to upload a PDF file. The backend API extracts text (using OpenAI to either summarize or extract text from images) instead of traditional OCR methods. -
Translation:
The extracted text is sent to the/translate
endpoint, translated using the OpenAI-powered service, and the resulting translation is displayed in the web interface.
Tip: After clicking "Extract & Translate," please allow up to a minute for the processing to complete before the translated text is shown.
For users interested in custom translation models, the repository includes optional scripts for fine-tuning:
-
mT5:
A multilingual text-to-text transformer pre-trained on extensive multilingual datasets. It serves as a robust base model for translation tasks across multiple languages. -
VBLoRA:
Stands for Varying Bottleneck Low-Rank Adaptation. This technique allows for efficient fine-tuning by adapting only a small subset of model parameters, reducing computational costs and mitigating catastrophic forgetting.
The script llm/mt5_translation.py
demonstrates how to integrate VBLoRA with mT5 for resource-friendly fine-tuning.
Note: Due to high computational costs and resource limitations, initial experiments were paused after a few optimization trials. However, if you have sufficient resources, the code is fully runnable for training your own model. Feel free to explore and extend this capability.
-
422 Unprocessable Entity:
Verify that JSON requests to endpoints (e.g.,/translate
) conform to the expected Pydantic models. -
OpenAI API Errors:
Confirm that your OpenAI API key is valid and that the model configuration aligns with supported parameters.
- Built with FastAPI
- Powered by OpenAI
- Fine-tuning experiments use mT5 enhanced with VBLoRA.
- Special thanks to all contributors and the open-source community.