Persian Voice Command System

Description

Welcome to the Persian Voice Command System, an innovative Python project that empowers Persian speakers to control their Windows computers using voice commands. This application allows users to record voice commands in Persian, transcribe them into text using a fine-tuned AI model, and execute corresponding Windows commands securely. Built with a user-friendly Streamlit interface, it combines advanced AI technologies with a simple design, making it accessible for both everyday users and developers interested in AI-driven applications.

The system is optimized for Persian language processing, ensuring accurate transcription and command execution. By using local AI models, it prioritizes user privacy and supports offline functionality after initial setup, making it a robust solution for voice-based computer control.

Features

Voice Recording: Capture 5-second voice commands in Persian via a simple web interface.
Speech-to-Text Conversion: Transcribe audio into Persian text using the Persian-optimized Wav2Vec2 model.
Command Execution: Process transcribed text into secure Windows commands using Ollama with the qwen2.5 model.
Secure Execution: Restrict commands to a predefined set, including:
- Starting applications: Chrome, Firefox, Visual Studio Code, File Explorer
- Opening directories: Downloads, Documents
- Terminating processes: Chrome, Visual Studio Code
User-Friendly Interface: Streamlit provides an intuitive web-based interface for seamless interaction.
Offline Capability: Runs locally with downloaded models, ensuring privacy and functionality without internet dependency after setup.

How It Works

The system integrates three core components:

Voice Recording: Uses PyAudio to record 5 seconds of audio from the user's microphone, saving it as a WAV file.
Speech-to-Text Conversion: Employs the m3hrdadfi/wav2vec2-large-xlsr-persian model from Hugging Face to transcribe audio into Persian text, with audio preprocessing handled by Torchaudio.
Command Execution: Processes transcribed text using Ollama's qwen2.5 model to generate Windows commands, executed securely via Python's subprocess module.

The workflow is managed by LangGraph, ensuring a structured process from input to output, all accessible through a Streamlit interface.

System Requirements

Operating System: Windows (required for command execution).
Python: Version 3.8 or higher.
Microphone: A working microphone for voice input.
Internet: Required for initial setup to download models.
Hardware: GPU recommended for better performance, but CPU is sufficient.

Installation

Follow these steps to set up the Persian Voice Command System:

Install Python 3.8 or Higher
- Download and install from Python.org.
Install Required Libraries
- Run the following command to install dependencies:
```
pip install streamlit pyaudio transformers torchaudio torch langchain-core langchain-ollama langgraph
```
- Note for Windows Users: PyAudio requires PortAudio. Install it and ensure it's in your system PATH.
Install Ollama
- Follow instructions on Ollama's website.
- Pull the required model:
```
ollama pull qwen2.5
```
- Start Ollama:
```
ollama serve
```

Clone the Repository

git clone https://github.com/armanjscript/Persian-Voice-Command-System.git

Navigate to the Project Directory
- ```
cd Persian-Voice-Command-System
```
Run the Application
- ```
streamlit run app.py
```

Note: A GPU enhances performance for speech-to-text and command generation, but the system runs on CPU as well. Ensure a stable internet connection for initial model downloads.

Usage

Launch the Application
- Run streamlit run app.py to open the app in your default web browser.
Interact with the Interface
- Click "Record Command (5 seconds)" to record a voice command in Persian.
- The recorded audio will play back for verification.
- Click "Process Command" to transcribe the audio and execute the command.
- View the transcribed text and command execution result in the interface.
Supported Commands
- Examples of supported Persian commands:
  - "مرورگر را باز کن" (Open Chrome browser)
  - "دانلودها" (Open Downloads folder)
  - "کد را ببند" (Close Visual Studio Code)

Important Notes:

Ensure a working microphone is connected.
Use Persian commands for optimal transcription accuracy.
The system restricts commands to a safe set to prevent unauthorized actions.

Technologies Used

Technology	Role
Python	Primary programming language.
Streamlit	Creates the interactive web interface.
PyAudio	Records audio from the microphone.
Transformers	Handles speech-to-text conversion with Wav2Vec2 model.
Torchaudio	Processes audio files for transcription.
Torch	Runs the speech-to-text model on CPU or GPU.
Langchain	Manages prompts and output parsing for command generation.
LangGraph	Structures the workflow for efficient task management.
Ollama	Runs the local language model for command processing.
Subprocess	Executes Windows commands securely.

Security

The system prioritizes security by restricting command execution to a predefined set of safe commands. This prevents malicious or unauthorized actions, limiting operations to starting specific applications (Chrome, Firefox, VS Code, Explorer), opening designated directories (Downloads, Documents), and terminating specific processes (Chrome, VS Code).

Contributing

Contributions are welcome! To contribute:

Fork the repository on GitHub.
Create a new branch for your changes.
Submit a pull request with a clear description.
For bug reports or feature requests, open an issue on GitHub.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or feedback, open an issue on GitHub or email [[email protected]].

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.markdown		README.markdown
app.py		app.py
command_agent.py		command_agent.py
stt_service.py		stt_service.py
voice_recorder.py		voice_recorder.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Persian Voice Command System

Description

Features

How It Works

System Requirements

Installation

Usage

Technologies Used

Security

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Languages

armanjscript/Persian-Voice-Command-System

Folders and files

Latest commit

History

Repository files navigation

Persian Voice Command System

Description

Features

How It Works

System Requirements

Installation

Usage

Technologies Used

Security

Contributing

License

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages