A Python command-line tool to convert ZIM files (used by Kiwix and others for offline content) to EPUB format for e-readers.
- Convert ZIM files to EPUB format with robust error handling
- Option to include or exclude images
- Automatic table of contents generation based on article names
- Limit the number of articles to include
- Preserves metadata from the ZIM file
- Clean, readable formatting for e-readers
- Handles URL-encoded paths and special characters
- Supports various ZIM file structures and formats
- Extracts content from main entry when standard article paths aren't available
- Avoids duplicate images in the output EPUB
- Full crawl mode for problematic ZIM files
This package is compatible with:
- Linux (Debian, Ubuntu, Fedora, etc.)
- macOS
Note: Windows is not currently supported due to limitations with the libzim library.
- Package Structure: Reorganized into a proper Python package structure for better maintainability
- Improved URL handling: Added support for URL-encoded paths and special characters
- Enhanced image processing: Fixed issues with duplicate images and improved mimetype detection
- Better article extraction: Added multiple methods to extract articles from different ZIM file structures
- Robust error handling: Added comprehensive error handling and fallback mechanisms
- Detailed logging: Added verbose logging to help diagnose issues
- CI/CD Pipeline: Added GitHub Actions for automated testing and releases
- Full crawl mode: Added option to crawl through all entries in the ZIM file
- Python 3.6 or higher
- C++ libzim library (required for the Python bindings)
- Linux or macOS operating system
brew install libzim
apt-get install libzim-dev
dnf install libzim-devel
You can install the package directly from PyPI:
USE_SYSTEM_LIBZIM=1 pip install zim2epub
-
Clone this repository:
git clone https://github.com/izzoa/zim2epub.git cd zim2epub
-
Install the required dependencies:
USE_SYSTEM_LIBZIM=1 pip install -r requirements.txt
-
Install the package:
pip install .
Or in development mode:
pip install -e .
python -m zim2epub.cli path/to/your/file.zim
This will create an EPUB file with the same name as the input file in the current directory.
usage: python -m zim2epub.cli [-h] [-o OUTPUT] [--no-images] [--no-toc] [--max-articles MAX_ARTICLES] [-v] [--full-crawl] zim_file
Convert ZIM files to EPUB format
positional arguments:
zim_file Path to the ZIM file to convert
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Path for the output EPUB file (default: same as input with .epub extension)
--no-images Do not include images in the EPUB (default: False)
--no-toc Do not generate a table of contents (default: False)
--max-articles MAX_ARTICLES
Maximum number of articles to include (default: None)
-v, --verbose Show verbose output (default: False)
--full-crawl Use full crawl mode to extract all articles (default: False)
Convert a ZIM file without images (useful for smaller file size):
python -m zim2epub.cli wikipedia.zim --no-images
Convert a ZIM file with a custom output path:
python -m zim2epub.cli wikipedia.zim -o my-wikipedia.epub
Convert only the first 100 articles of a ZIM file:
python -m zim2epub.cli wikipedia.zim --max-articles 100
Enable verbose output for debugging:
python -m zim2epub.cli wikipedia.zim -v
Enable full crawl mode for problematic ZIM files:
python -m zim2epub.cli wikipedia.zim --full-crawl
You can also use the ZimToEpub
class directly in your Python code:
from zim2epub import ZimToEpub
converter = ZimToEpub(
zim_path="path/to/file.zim",
output_path="output.epub",
include_images=True,
generate_toc=True,
max_articles=None,
verbose=True,
full_crawl=False # Set to True to use full crawl mode
)
output_path = converter.convert()
print(f"EPUB created at: {output_path}")
pytest
python -m build
- Update the version in
zim2epub/__init__.py
- Create a new tag:
git tag -a v0.1.0 -m "Release v0.1.0"
- Push the tag:
git push origin v0.1.0
The GitHub Actions workflow will automatically build and publish the release to PyPI.
If you encounter issues:
- Try running with the
-v
flag to see detailed logs - Make sure you have the C++ libzim library installed
- Check that your ZIM file is valid and not corrupted
- For image issues, try using the
--no-images
flag - For problematic ZIM files, try using the
--full-crawl
flag
- Python 3.6 or higher
- libzim (Python bindings for the ZIM file format)
- EbookLib (for EPUB creation)
- BeautifulSoup4 (for HTML parsing)
- tqdm (for progress bars)
- lxml (for XML processing)
This project is licensed under the MIT License - see the LICENSE file for details.