A real-time AI voice assistant powered by DeepSeek R1 that enables seamless voice conversations through speech-to-text transcription, AI response generation, and text-to-speech synthesis.
This project creates an interactive AI voice agent that:
- Captures and transcribes speech in real-time using AssemblyAI
- Generates intelligent responses using DeepSeek R1 (7B model) via Ollama
- Converts AI responses back to natural speech using ElevenLabs
- Streams audio responses for immediate playback
- Real-time Speech Recognition: High-quality speech-to-text transcription with AssemblyAI
- Advanced AI Responses: Powered by DeepSeek R1's reasoning capabilities
- Natural Voice Synthesis: Professional text-to-speech with ElevenLabs
- Streaming Audio Playback: Low-latency audio streaming for responsive conversations
- Conversation Memory: Maintains context throughout the conversation
- Cross-platform Support: Works on macOS, Linux, and Windows
- AssemblyAI API Key: Get your free API key
- ElevenLabs API Key: Sign up for ElevenLabs
Download and install Ollama from ollama.com
Ubuntu/Debian:
sudo apt update && sudo apt install portaudio19-dev
macOS:
brew install portaudio
Windows: PortAudio is typically included with the Python package installation.
brew install mpv
git clone https://github.com/danieladdisonorg/DeepSeek-R1-Voice-Agent.git
cd DeepSeek-R1-Voice-Agent
pip install "assemblyai[extras]" ollama elevenlabs
ollama pull deepseek-r1:7b
Edit AIVoiceAgent.py
and replace the placeholder API keys:
aai.settings.api_key = "YOUR_ASSEMBLYAI_API_KEY"
self.client = ElevenLabs(api_key="YOUR_ELEVENLABS_API_KEY")
python AIVoiceAgent.py
- Speak: The agent listens for your voice input
- Processing: Your speech is transcribed and sent to DeepSeek R1
- Response: The AI generates a response (limited to 300 characters for quick interactions)
- Playback: The response is converted to speech and played back
- Continue: The conversation continues with maintained context
Press Ctrl+C
to stop the voice agent.
- AI Model: DeepSeek R1 7B (configurable in the code)
- Voice Model: ElevenLabs Turbo v2 (configurable)
- Response Length: Limited to 300 characters (adjustable in system prompt)
- Sample Rate: 16kHz for optimal quality
- Modify the system prompt in
AIVoiceAgent.py
to change AI behavior - Adjust response length limits
- Change voice models in ElevenLabs configuration
- Modify audio streaming parameters
"No module named 'assemblyai'"
pip install "assemblyai[extras]"
"Ollama connection error"
- Ensure Ollama is running:
ollama serve
- Verify the model is downloaded:
ollama list
"Audio device not found"
- Check microphone permissions
- Verify PortAudio installation
- Test microphone with other applications
"ElevenLabs API error"
- Verify API key is correct
- Check API quota/usage limits
- Ensure stable internet connection
- Use a quality microphone for better transcription accuracy
- Ensure stable internet connection for API calls
- Close unnecessary applications to free up system resources
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β Microphone βββββΆβ AssemblyAI βββββΆβ DeepSeek R1 β
β (Audio Input) β β (Speech-to- β β (AI Response β
βββββββββββββββββββ β Text) β β Generation) β
ββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ ββββββββββββββββ β
β Speakers ββββββ ElevenLabs ββββββββββββββββ
β (Audio Output) β β (Text-to- β
βββββββββββββββββββ β Speech) β
ββββββββββββββββ
This project is open source. Please check the repository for license details.
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
For issues and questions:
- Open an issue on GitHub
- Check the troubleshooting section above
- Review API documentation for AssemblyAI, Ollama, and ElevenLabs
Note: This project requires active internet connection for API services and sufficient system resources to run the DeepSeek R1 model locally via Ollama.