Skip to content

Add Habana Gaudi (HPU) Support & Performance Benchmarks for Khoj #1125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
72 changes: 72 additions & 0 deletions Dockerfile.hpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# syntax=docker/dockerfile:1
FROM vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest AS base
LABEL homepage="https://khoj.dev"
LABEL repository="https://github.com/khoj-ai/khoj"
LABEL org.opencontainers.image.source="https://github.com/khoj-ai/khoj"
LABEL org.opencontainers.image.description="Your second brain, containerized for personal, local deployment."

# Install System Dependencies
RUN apt update -y && apt -y install \
python3-pip \
tzdata \
swig \
curl \
# Required by RapidOCR
libgl1 \
libglx-mesa0 \
libglib2.0-0 \
# Required by llama-cpp-python pre-built wheels. See #1628
musl-dev && \
ln -s /usr/lib/x86_64-linux-musl/libc.so /lib/libc.musl-x86_64.so.1 && \
# Clean up
apt clean && rm -rf /var/lib/apt/lists/*

# Build Server
FROM base AS server-deps
WORKDIR /app
COPY pyproject.toml .
COPY README.md .
ARG VERSION=0.0.0

# Set environment variables for Habana
ENV HABANA_VISIBLE_DEVICES=all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seems to be a required runtime environment variable to enable habana hpu? If so, it just be mentioned in the Khoj setup docs under the HPU tab. See /documentation/docs/get-started/setup.mdx for reference

ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
ENV PT_HPU_LAZY_ACC_PAR_MODE=0
ENV PT_HPU_ENABLE_LAZY_COLLECTIVES=1
Comment on lines +33 to +35
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem like optional runtime variables to configure habana support based on the Habana docs?

If these runtime env vars are the only change this Dockerfile.hpu adds, we can drop this Dockerfile.hpu file and just mention in our setup documentation that folks wanting to run khoj on habana hpu can setup these (and other required) environment variables for their setup by referring to the Habana documentation before starting khoj?



# use the pre-built llama-cpp-python, torch cpu wheel
ENV PIP_EXTRA_INDEX_URL="https://abetlen.github.io/llama-cpp-python/whl/cpu"
# avoid downloading unused cuda specific python packages
ENV CUDA_VISIBLE_DEVICES=""
RUN sed -i "s/dynamic = \\[\"version\"\\]/version = \"$VERSION\"/" pyproject.toml && \
pip install --no-cache-dir .

# Build Web App
FROM node:20-alpine AS web-app
# Set build optimization env vars
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1
WORKDIR /app/src/interface/web
# Install dependencies first (cache layer)
COPY src/interface/web/package.json src/interface/web/yarn.lock ./
RUN yarn install --frozen-lockfile
# Copy source and build
COPY src/interface/web/. ./
RUN yarn build

# Merge the Server and Web App into a Single Image
FROM base
ENV PYTHONPATH=/app/src
WORKDIR /app
COPY --from=server-deps /usr/local/lib/python3.10/dist-packages /usr/local/lib/python3.10/dist-packages
COPY --from=web-app /app/src/interface/web/out ./src/khoj/interface/built
COPY . .
RUN cd src && python3 khoj/manage.py collectstatic --noinput

# Run the Application
# There are more arguments required for the application to run,
# but those should be passed in through the docker-compose.yml file.
ARG PORT
EXPOSE ${PORT}
ENTRYPOINT ["python3", "src/khoj/main.py"]
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,12 @@ You can see the full feature list [here](https://docs.khoj.dev/category/features

To get started with self-hosting Khoj, [read the docs](https://docs.khoj.dev/get-started/setup).

## 🚀 HPU (Habana Processing Unit) Support

We now support running Khoj on **Habana Gaudi accelerators (HPUs)**! This allows you to leverage the power of Habana's AI processors for faster and more efficient model inference.

For more information, see [here](src/khoj/app/README.md#-hpu-support).

## Enterprise

Khoj is available as a cloud service, on-premises, or as a hybrid solution. To learn more about Khoj Enterprise, [visit our website](https://khoj.dev/teams).
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ dependencies = [
"google-generativeai == 0.8.3",
"pyjson5 == 1.6.7",
"resend == 1.0.1",
"optimum-habana == 1.14.1",
"email-validator == 2.2.0",
]
dynamic = ["version"]
Expand Down
48 changes: 48 additions & 0 deletions src/khoj/app/README.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates to this should be instead moved to a new HPU tab under the Khoj setup docs at /documentation/docs/get-started/setup.mdx (which maps to https://docs.khoj.dev/get-started/setup/)

Original file line number Diff line number Diff line change
Expand Up @@ -92,3 +92,51 @@ While we're using Django for the ORM, we're still using the FastAPI server for t
```bash
python3 src/khoj/main.py --anonymous-mode
```


## 🚀 HPU Support
### 🛠️ Setup for HPU

To run Khoj on a Habana Gaudi device, follow these steps:

1. **Build the HPU Docker Image**:
Use the provided `Dockerfile.hpu` to build a Docker image optimized for HPU:
```bash
docker build -t khoj-hpu -f Dockerfile.hpu .
```

Comment on lines +102 to +107
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be required if the previous comments on the Dockerfile.hpu are valid. Folks can just use the default Khoj dockerfile or image

2. **Run the Docker Container**:
Start the container with the appropriate environment variables for HPU:
```bash
docker run --runtime=habana -e HABANA_VISIBLE_DEVICES=all -p <PORT>:<PORT> khoj-hpu
```
Replace `<PORT>` with the port number you want to expose.

3. **Verify HPU Support**:
Ensure that the application detects the HPU device by checking the logs. The application will automatically use the HPU if available.

### 📦 New Dependencies

To support HPU and other advanced features, we've added the following dependencies:

- **`optimum-habana`**: Optimizes models for Habana Gaudi accelerators.
- **`torch-geometric`**: Enables deep learning on graph-based data structures.
- **`numba`**: Accelerates Python code by compiling it to machine code at runtime.
Comment on lines +122 to +124
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the only dependency explicitly added is optimum-habana (in pyproject.toml)?


These dependencies are automatically installed when you build the Docker image or install the project locally.

### 🧠 Device Selection

The application now supports multiple device types, including **CUDA**, **HPU**, **MPS** (Apple Silicon), and **CPU**. You can specify your preferred device by passing the `preferred_device` argument to the `get_device()` function in `helpers.py`. For example:

```python
device = get_device(preferred_device="hpu") # Use HPU if available
```

If no preferred device is specified, the application will automatically select the best available device.

Comment on lines +128 to +137
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used and is relevant to Khoj users or deployers. So should be removed from the documention

### 📝 Notes

- Ensure that your system has the necessary Habana drivers and software stack installed to use HPUs.
- For more information on Habana Gaudi accelerators, visit the [Habana Labs documentation](https://docs.habana.ai/).

32 changes: 26 additions & 6 deletions src/khoj/utils/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import copy
import datetime
import importlib
import io
import ipaddress
import logging
Expand Down Expand Up @@ -301,6 +302,8 @@ def log_telemetry(
def get_device_memory() -> int:
"""Get device memory in GB"""
device = get_device()
if device.type == "hpu":
return torch.hpu.get_device_properties(device).total_memory
if device.type == "cuda":
return torch.cuda.get_device_properties(device).total_memory
elif device.type == "mps":
Expand All @@ -309,15 +312,32 @@ def get_device_memory() -> int:
return psutil.virtual_memory().total


def get_device() -> torch.device:
"""Get device to run model on"""
def get_device(preferred_device=None) -> torch.device:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred_device arg seems unused. Should we remove it?

"""
Determine the appropriate device to use (cuda, hpu, or cpu).
Args:
preferred_device (str): User-preferred device ('cuda', 'hpu', or 'cpu').
Returns:
torch.device: 'cuda', 'hpu', 'mps' or 'cpu'.
"""
# Check for HPU support
if importlib.util.find_spec("habana_frameworks") is not None:
from habana_frameworks.torch.utils.library_loader import load_habana_module

load_habana_module()
if torch.hpu.is_available():
if preferred_device is None or "hpu" in preferred_device:
return torch.device("hpu")
# Use CUDA GPU if available
if torch.cuda.is_available():
# Use CUDA GPU
return torch.device("cuda:0")
if preferred_device is None or "cuda" in preferred_device:
return torch.device("cuda:0")
# Use Apple M1 Metal Acceleration if available
elif torch.backends.mps.is_available():
# Use Apple M1 Metal Acceleration
return torch.device("mps")
if preferred_device is None or "mps" in preferred_device:
return torch.device("mps")
else:
# Default to CPU
return torch.device("cpu")


Expand Down