Skip to content

Add Habana Gaudi (HPU) Support & Performance Benchmarks for Khoj #1125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

BartoszBLL
Copy link

This PR introduces support for Habana Gaudi accelerators (HPUs) to the project, enabling the application to run on HPU devices in addition to the existing support for CUDA, MPS, and CPU. The changes include:

🚀 Key Updates

  1. HPU Dockerfile (Dockerfile.hpu)
    • Added a new Dockerfile for running Khoj on Habana Gaudi devices.
    • Installs necessary dependencies (optimum-habana).
    • Configures environment variables for Habana optimizations.
  2. Device Selection (helpers.py)
    • Enhanced get_device() to detect HPU if available.
    • Supports cuda, hpu, mps, or cpu based on availability or user preference.
  3. Memory Management (helpers.py)
    • get_device_memory() now supports Habana HPU memory queries.
  4. Dependency Updates (pyproject.toml)
    • Added optimum-habana, torch-geometric, and numba.
  5. Documentation
    • README.md & src/khoj/app/README.md: Instructions for building and running Khoj with HPU.

💎 Why This Matters:

HPU Support

This PR enables the application to leverage Habana Gaudi accelerators, which can provide significant performance improvements for deep learning workloads.

Flexibility

Users can now choose their preferred device (CUDA, HPU, MPS, or CPU) for running the application, making it more versatile across different hardware setups.

Optimization

The addition of optimum-habana ensures that models are optimized for HPU and other hardware, improving efficiency and performance.

⚡ Performance Benchmarks

HPU: ~0.2703s average runtime (10 runs)
CPU: ~76.3144s average runtime (10 runs)
Result: ~282× speedup using HPU compared to CPU.

🛠 How to Test

Use the new Dockerfile.hpu to build and run the application on a system with Habana Gaudi accelerators.

# Build the HPU Docker image
docker build -t khoj-hpu -f Dockerfile.hpu .

# Run with Habana runtime
docker run --runtime=habana -e HABANA_VISIBLE_DEVICES=all -p <PORT>:<PORT> khoj-hpu

Check logs to confirm that HPU is recognized and in use.

✅ Checklist

  • Tested on Habana Gaudi accelerator
  • Verified compatibility with CPU and other devices
  • Updated documentation
  • Added required dependencies

📝 Notes

This PR is part of the effort to expand hardware support for the application, ensuring it can run efficiently on a wide range of devices.

Copy link
Member

@debanjum debanjum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for creating a PR to add support for Habana HPU to Khoj. Apologize for the delayed review. Not sure if this change should be included in Khoj (yet). Some questions below:

  1. Are you using Khoj on Intel Gaudi machines? What's the use-case? Gaudi support seems more relevant for production scenarios. For such setups with Khoj, you should offload both the LLM heavy components (embedding generation and chat model interactions) to appropriate llm inference servers (like vllm, sglang, tensort etc.).
  2. You mention performance benchmarks for Khoj with HPU support. Can you clarify what kind of workloads you tested? Was it the rag indexing, interacting with a local/offline chat model or something else? More details on the perf benchmarks would be useful for context

@@ -309,15 +312,32 @@ def get_device_memory() -> int:
return psutil.virtual_memory().total


def get_device() -> torch.device:
"""Get device to run model on"""
def get_device(preferred_device=None) -> torch.device:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred_device arg seems unused. Should we remove it?

Comment on lines +122 to +124
- **`optimum-habana`**: Optimizes models for Habana Gaudi accelerators.
- **`torch-geometric`**: Enables deep learning on graph-based data structures.
- **`numba`**: Accelerates Python code by compiling it to machine code at runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the only dependency explicitly added is optimum-habana (in pyproject.toml)?

Comment on lines +128 to +137
### 🧠 Device Selection

The application now supports multiple device types, including **CUDA**, **HPU**, **MPS** (Apple Silicon), and **CPU**. You can specify your preferred device by passing the `preferred_device` argument to the `get_device()` function in `helpers.py`. For example:

```python
device = get_device(preferred_device="hpu") # Use HPU if available
```

If no preferred device is specified, the application will automatically select the best available device.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used and is relevant to Khoj users or deployers. So should be removed from the documention

Comment on lines +33 to +35
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
ENV PT_HPU_LAZY_ACC_PAR_MODE=0
ENV PT_HPU_ENABLE_LAZY_COLLECTIVES=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem like optional runtime variables to configure habana support based on the Habana docs?

If these runtime env vars are the only change this Dockerfile.hpu adds, we can drop this Dockerfile.hpu file and just mention in our setup documentation that folks wanting to run khoj on habana hpu can setup these (and other required) environment variables for their setup by referring to the Habana documentation before starting khoj?

ARG VERSION=0.0.0

# Set environment variables for Habana
ENV HABANA_VISIBLE_DEVICES=all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seems to be a required runtime environment variable to enable habana hpu? If so, it just be mentioned in the Khoj setup docs under the HPU tab. See /documentation/docs/get-started/setup.mdx for reference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates to this should be instead moved to a new HPU tab under the Khoj setup docs at /documentation/docs/get-started/setup.mdx (which maps to https://docs.khoj.dev/get-started/setup/)

Comment on lines +102 to +107
1. **Build the HPU Docker Image**:
Use the provided `Dockerfile.hpu` to build a Docker image optimized for HPU:
```bash
docker build -t khoj-hpu -f Dockerfile.hpu .
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be required if the previous comments on the Dockerfile.hpu are valid. Folks can just use the default Khoj dockerfile or image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants