A powerful, offline Text-to-Speech (TTS) solution based on the Kokoro-82M model, featuring 44 high-quality voices across multiple languages and accents. This local implementation provides fast, reliable text-to-speech conversion with support for multiple output formats (WAV, MP3, AAC) and real-time generation progress display.
Watch and listen to a sample of our custom voice created by interpolating Nicole and Adam voices:
demo_video.mp4
- 🎙️ 44 high-quality voices across American English, British English, and other languages
- 💻 Completely offline operation - no internet needed after initial setup
- 📚 Support for PDF and TXT file input
- 🎵 Multiple output formats (WAV, MP3, AAC)
- ⚡ Generate custom voices instantly
- 🎛️ Adjustable speech speed (0.5x to 2.0x)
- 📊 Automatic text chunking for optimal processing
- 🎯 Easy-to-use interactive CLI interface
creating custom voices
python custom_interpolation.py
creating audio books
python audio_book.py
Note - you need to install the prerequisites and follow the installation steps before running the above commands
Before installing Kokoro TTS Local, ensure you have the following prerequisites installed from the below guide:
- Python 3.10.0 or higher
- FFmpeg (for MP3/AAC conversion)
- CUDA-compatible GPU (optional, for faster generation)
- Git (for version control and package management)
-
Download Git installer:
winget install --id Git.Git -e --source winget
Alternatively, download from Git for Windows
-
Verify installation:
git --version
# Ubuntu/Debian
sudo apt update
sudo apt install git
# Fedora
sudo dnf install git
# Arch Linux
sudo pacman -S git
# Verify installation
git --version
# Using Homebrew
brew install git
# Verify installation
git --version
-
iex (irm ffmpeg.tc.ht)
- Verify installation by opening a new Command Prompt:
ffmpeg -version
# Ubuntu/Debian
sudo apt update
sudo apt install ffmpeg
# Fedora
sudo dnf install ffmpeg
# Arch Linux
sudo pacman -S ffmpeg
# Using Homebrew
brew install ffmpeg
-
Check your GPU compatibility:
- Open Command Prompt and run:
dxdiag
- Go to the "Display" tab
- Note your GPU model
- Open Command Prompt and run:
-
Download CUDA Toolkit:
- Visit NVIDIA CUDA Downloads
- Select Windows and your version
- Choose "exe (network)" installer
- Download and run the installer
-
Installation steps:
- Run the downloaded installer
- Choose "Express Installation"
- Wait for the installation to complete
- Restart your computer
-
Verify installation:
nvidia-smi nvcc --version
-
Check your GPU compatibility:
lspci | grep -i nvidia
-
Remove old NVIDIA drivers (if any):
sudo apt-get purge nvidia*
-
Add NVIDIA package repositories:
# Ubuntu 22.04 LTS(install drivers based on your distro release, 22.04 drivers are no longer compatible with the 24.01 or 24.02 version) wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/12.3.1/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.1-545.23.08-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.1-545.23.08-1_amd64.deb sudo cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
-
Install CUDA drivers:
sudo apt-get update sudo apt-get -y install cuda-drivers
-
Install CUDA Toolkit:
sudo apt-get install cuda
-
Add CUDA to PATH:
echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc source ~/.bashrc
-
Verify installation:
nvidia-smi nvcc --version
Note: For macOS, CUDA is not supported natively. The model will run on CPU only.
-
Install Python 3.10.0
- Download the installer from Python's official website
- During installation, check "Add Python to PATH"
- Verify installation:
python --version
-
Install espeak-ng
- Download the latest release from espeak-ng releases
- Run the installer and follow the prompts
- Add espeak-ng to your system PATH if not done automatically
-
Clone the repository:
git clone https://github.com/solveditnpc/kokoro-tts-local.git cd kokoro-tts-local
-
Create and activate a virtual environment:
python -m venv venv .\venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
install espeak
# Install espeak-ng sudo apt-get install espeak-ng
-
Install Dependencies
# Install system dependencies sudo apt-get update sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \ libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \ libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev \ liblzma-dev python-openssl git
-
Install pyenv
# Install pyenv curl https://pyenv.run | bash # Add to ~/.bashrc echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc echo 'eval "$(pyenv init -)"' >> ~/.bashrc # Reload shell exec "$SHELL"
-
Install Python 3.10.0
# Install Python 3.10.0 pyenv install 3.10.0 # Clone repository git clone https://github.com/solveditnpc/kokoro-tts-local.git cd kokoro-tts-local # Set local Python version pyenv local 3.10.0
-
Create and activate virtual environment
# Create virtual environment python -m venv venv # Activate virtual environment source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Install Dependencies
# Install Homebrew if not already installed /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Install system dependencies brew install openssl readline sqlite3 xz zlib tcl-tk git # Install espeak-ng brew install espeak-ng
-
Install pyenv
# Install pyenv brew install pyenv # Add to ~/.zshrc (or ~/.bashrc if using bash) echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc echo 'eval "$(pyenv init -)"' >> ~/.zshrc # Reload shell exec "$SHELL"
-
Install Python 3.10.0
# Install Python 3.10.0 pyenv install 3.10.0 # Clone repository git clone https://github.com/solveditnpc/kokoro-tts-local.git cd kokoro-tts-local # Set local Python version pyenv local 3.10.0
-
Create and activate virtual environment
# Create virtual environment python -m venv venv # Activate virtual environment source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
The system includes 44 different voices across various categories:
-
Female (af_*):
- af_alloy: Alloy - Clear and professional
- af_aoede: Aoede - Smooth and melodic
- af_bella: Bella - Warm and friendly
- af_jessica: Jessica - Natural and engaging
- af_kore: Kore - Bright and energetic
- af_nicole: Nicole - Professional and articulate
- af_nova: Nova - Modern and dynamic
- af_river: River - Soft and flowing
- af_sarah: Sarah - Casual and approachable
- af_sky: Sky - Light and airy
-
Male (am_*):
- am_adam: Adam - Strong and confident
- am_echo: Echo - Resonant and clear
- am_eric: Eric - Professional and authoritative
- am_fenrir: Fenrir - Deep and powerful
- am_liam: Liam - Friendly and conversational
- am_michael: Michael - Warm and trustworthy
- am_onyx: Onyx - Rich and sophisticated
- am_puck: Puck - Playful and energetic
-
Female (bf_*):
- bf_alice: Alice - Refined and elegant
- bf_emma: Emma - Warm and professional
- bf_isabella: Isabella - Sophisticated and clear
- bf_lily: Lily - Sweet and gentle
-
Male (bm_*):
- bm_daniel: Daniel - Polished and professional
- bm_fable: Fable - Storytelling and engaging
- bm_george: George - Classic British accent
- bm_lewis: Lewis - Modern British accent
-
French Female (ff_*):
- ff_siwis: Siwis - French accent
-
High-pitched Voices:
- Female (hf_*):
- hf_alpha: Alpha - Higher female pitch
- hf_beta: Beta - Alternative high female pitch
- Male (hm_*):
- hm_omega: Omega - Higher male pitch
- hm_psi: Psi - Alternative high male pitch
- Female (hf_*):
.
├── .cache/ # Cache directory for downloaded models
│ └── huggingface/ # Hugging Face model cache
├── .git/ # Git repository data
├── .gitignore # Git ignore rules
├── __pycache__/ # Python cache files
├── voices/ # Voice model files (downloaded on demand)
│ └── *.pt # Individual voice files
├── venv/ # Python virtual environment
├── outputs/ # Generated audio files directory
├── LICENSE # Apache 2.0 License file
├── README.md # Project documentation
├── models.py # Core TTS model implementation
├── gradio_interface.py # Web interface implementation
├── config.json # Model configuration file
├── requirements.txt # Python dependencies
└── tts_demo.py # CLI implementation
The project uses the latest Kokoro model from Hugging Face:
- Repository: hexgrad/Kokoro-82M
- Model file:
kokoro-v1_0.pth
(downloaded automatically) - Sample rate: 24kHz
- Voice files: Located in the
voices/
directory (downloaded automatically) - Available voices: 44 voices across multiple categories
- Languages: American English ('a'), British English ('b')
- Model size: 82M parameters
Common issues and solutions:
-
Model Download Issues
- Ensure stable internet connection
- Check Hugging Face is accessible
- Verify sufficient disk space
- Try clearing the
.cache/huggingface
directory
-
CUDA/GPU Issues
- Verify CUDA installation with
nvidia-smi
- Update GPU drivers
- Check PyTorch CUDA compatibility
- Fall back to CPU if needed
- Verify CUDA installation with
-
Audio Output Issues
- Check system audio settings
- Verify output directory permissions
- Install FFmpeg for MP3/AAC support
- Try different output formats
-
Voice File Issues
- Delete and let system redownload voice files
- Check
voices/
directory permissions - Verify voice file integrity
- Try using a different voice
Feel free to contribute by:
- Opening issues for bugs or feature requests
- Submitting pull requests with improvements
- Helping with documentation
- Testing different voices and reporting issues
- Suggesting new features or optimizations
- Testing on different platforms and reporting results
Apache 2.0 - See LICENSE file for details