-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add Habana Gaudi (HPU) Support & Performance Benchmarks for Khoj #1125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add Dockerfile for HPU runtime along with installation requirements.
Add HPUs (Intel® Gaudi®) support
fix: Device loading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for creating a PR to add support for Habana HPU to Khoj. Apologize for the delayed review. Not sure if this change should be included in Khoj (yet). Some questions below:
- Are you using Khoj on Intel Gaudi machines? What's the use-case? Gaudi support seems more relevant for production scenarios. For such setups with Khoj, you should offload both the LLM heavy components (embedding generation and chat model interactions) to appropriate llm inference servers (like vllm, sglang, tensort etc.).
- You mention performance benchmarks for Khoj with HPU support. Can you clarify what kind of workloads you tested? Was it the rag indexing, interacting with a local/offline chat model or something else? More details on the perf benchmarks would be useful for context
@@ -309,15 +312,32 @@ def get_device_memory() -> int: | |||
return psutil.virtual_memory().total | |||
|
|||
|
|||
def get_device() -> torch.device: | |||
"""Get device to run model on""" | |||
def get_device(preferred_device=None) -> torch.device: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preferred_device
arg seems unused. Should we remove it?
- **`optimum-habana`**: Optimizes models for Habana Gaudi accelerators. | ||
- **`torch-geometric`**: Enables deep learning on graph-based data structures. | ||
- **`numba`**: Accelerates Python code by compiling it to machine code at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the only dependency explicitly added is optimum-habana
(in pyproject.toml
)?
### 🧠 Device Selection | ||
|
||
The application now supports multiple device types, including **CUDA**, **HPU**, **MPS** (Apple Silicon), and **CPU**. You can specify your preferred device by passing the `preferred_device` argument to the `get_device()` function in `helpers.py`. For example: | ||
|
||
```python | ||
device = get_device(preferred_device="hpu") # Use HPU if available | ||
``` | ||
|
||
If no preferred device is specified, the application will automatically select the best available device. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used and is relevant to Khoj users or deployers. So should be removed from the documention
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none | ||
ENV PT_HPU_LAZY_ACC_PAR_MODE=0 | ||
ENV PT_HPU_ENABLE_LAZY_COLLECTIVES=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem like optional runtime variables to configure habana support based on the Habana docs?
If these runtime env vars are the only change this Dockerfile.hpu adds, we can drop this Dockerfile.hpu
file and just mention in our setup documentation that folks wanting to run khoj on habana hpu can setup these (and other required) environment variables for their setup by referring to the Habana documentation before starting khoj?
ARG VERSION=0.0.0 | ||
|
||
# Set environment variables for Habana | ||
ENV HABANA_VISIBLE_DEVICES=all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seems to be a required runtime environment variable to enable habana hpu? If so, it just be mentioned in the Khoj setup docs under the HPU tab. See /documentation/docs/get-started/setup.mdx
for reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates to this should be instead moved to a new HPU tab under the Khoj setup docs at /documentation/docs/get-started/setup.mdx
(which maps to https://docs.khoj.dev/get-started/setup/)
1. **Build the HPU Docker Image**: | ||
Use the provided `Dockerfile.hpu` to build a Docker image optimized for HPU: | ||
```bash | ||
docker build -t khoj-hpu -f Dockerfile.hpu . | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be required if the previous comments on the Dockerfile.hpu are valid. Folks can just use the default Khoj dockerfile or image
This PR introduces support for Habana Gaudi accelerators (HPUs) to the project, enabling the application to run on HPU devices in addition to the existing support for CUDA, MPS, and CPU. The changes include:
🚀 Key Updates
💎 Why This Matters:
HPU Support
This PR enables the application to leverage Habana Gaudi accelerators, which can provide significant performance improvements for deep learning workloads.
Flexibility
Users can now choose their preferred device (CUDA, HPU, MPS, or CPU) for running the application, making it more versatile across different hardware setups.
Optimization
The addition of optimum-habana ensures that models are optimized for HPU and other hardware, improving efficiency and performance.
⚡ Performance Benchmarks
HPU: ~0.2703s average runtime (10 runs)
CPU: ~76.3144s average runtime (10 runs)
Result: ~282× speedup using HPU compared to CPU.
🛠 How to Test
Use the new Dockerfile.hpu to build and run the application on a system with Habana Gaudi accelerators.
Check logs to confirm that HPU is recognized and in use.
✅ Checklist
📝 Notes
This PR is part of the effort to expand hardware support for the application, ensuring it can run efficiently on a wide range of devices.