Welcome to the Persian Voice Command System, an innovative Python project that empowers Persian speakers to control their Windows computers using voice commands. This application allows users to record voice commands in Persian, transcribe them into text using a fine-tuned AI model, and execute corresponding Windows commands securely. Built with a user-friendly Streamlit interface, it combines advanced AI technologies with a simple design, making it accessible for both everyday users and developers interested in AI-driven applications.
The system is optimized for Persian language processing, ensuring accurate transcription and command execution. By using local AI models, it prioritizes user privacy and supports offline functionality after initial setup, making it a robust solution for voice-based computer control.
- Voice Recording: Capture 5-second voice commands in Persian via a simple web interface.
- Speech-to-Text Conversion: Transcribe audio into Persian text using the Persian-optimized Wav2Vec2 model.
- Command Execution: Process transcribed text into secure Windows commands using Ollama with the
qwen2.5
model. - Secure Execution: Restrict commands to a predefined set, including:
- Starting applications: Chrome, Firefox, Visual Studio Code, File Explorer
- Opening directories: Downloads, Documents
- Terminating processes: Chrome, Visual Studio Code
- User-Friendly Interface: Streamlit provides an intuitive web-based interface for seamless interaction.
- Offline Capability: Runs locally with downloaded models, ensuring privacy and functionality without internet dependency after setup.
The system integrates three core components:
- Voice Recording: Uses PyAudio to record 5 seconds of audio from the user's microphone, saving it as a WAV file.
- Speech-to-Text Conversion: Employs the
m3hrdadfi/wav2vec2-large-xlsr-persian
model from Hugging Face to transcribe audio into Persian text, with audio preprocessing handled by Torchaudio. - Command Execution: Processes transcribed text using Ollama's
qwen2.5
model to generate Windows commands, executed securely via Python'ssubprocess
module.
The workflow is managed by LangGraph, ensuring a structured process from input to output, all accessible through a Streamlit interface.
- Operating System: Windows (required for command execution).
- Python: Version 3.8 or higher.
- Microphone: A working microphone for voice input.
- Internet: Required for initial setup to download models.
- Hardware: GPU recommended for better performance, but CPU is sufficient.
Follow these steps to set up the Persian Voice Command System:
-
Install Python 3.8 or Higher
- Download and install from Python.org.
-
Install Required Libraries
- Run the following command to install dependencies:
pip install streamlit pyaudio transformers torchaudio torch langchain-core langchain-ollama langgraph
- Note for Windows Users: PyAudio requires PortAudio. Install it and ensure it's in your system PATH.
- Run the following command to install dependencies:
-
Install Ollama
- Follow instructions on Ollama's website.
- Pull the required model:
ollama pull qwen2.5
- Start Ollama:
ollama serve
-
Clone the Repository
git clone https://github.com/armanjscript/Persian-Voice-Command-System.git
-
Navigate to the Project Directory
-
cd Persian-Voice-Command-System
-
-
Run the Application
-
streamlit run app.py
-
Note: A GPU enhances performance for speech-to-text and command generation, but the system runs on CPU as well. Ensure a stable internet connection for initial model downloads.
-
Launch the Application
- Run
streamlit run app.py
to open the app in your default web browser.
- Run
-
Interact with the Interface
- Click "Record Command (5 seconds)" to record a voice command in Persian.
- The recorded audio will play back for verification.
- Click "Process Command" to transcribe the audio and execute the command.
- View the transcribed text and command execution result in the interface.
-
Supported Commands
- Examples of supported Persian commands:
- "مرورگر را باز کن" (Open Chrome browser)
- "دانلودها" (Open Downloads folder)
- "کد را ببند" (Close Visual Studio Code)
- Examples of supported Persian commands:
Important Notes:
- Ensure a working microphone is connected.
- Use Persian commands for optimal transcription accuracy.
- The system restricts commands to a safe set to prevent unauthorized actions.
Technology | Role |
---|---|
Python | Primary programming language. |
Streamlit | Creates the interactive web interface. |
PyAudio | Records audio from the microphone. |
Transformers | Handles speech-to-text conversion with Wav2Vec2 model. |
Torchaudio | Processes audio files for transcription. |
Torch | Runs the speech-to-text model on CPU or GPU. |
Langchain | Manages prompts and output parsing for command generation. |
LangGraph | Structures the workflow for efficient task management. |
Ollama | Runs the local language model for command processing. |
Subprocess | Executes Windows commands securely. |
The system prioritizes security by restricting command execution to a predefined set of safe commands. This prevents malicious or unauthorized actions, limiting operations to starting specific applications (Chrome, Firefox, VS Code, Explorer), opening designated directories (Downloads, Documents), and terminating specific processes (Chrome, VS Code).
Contributions are welcome! To contribute:
- Fork the repository on GitHub.
- Create a new branch for your changes.
- Submit a pull request with a clear description.
- For bug reports or feature requests, open an issue on GitHub.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or feedback, open an issue on GitHub or email [[email protected]].