-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add Habana Gaudi (HPU) Support & Performance Benchmarks for Khoj #1125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
3c891e1
4e80e82
a17d22e
d9f7fea
a69780c
9ecdeb7
d3c3244
8564d1e
c403a05
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# syntax=docker/dockerfile:1 | ||
FROM vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest AS base | ||
LABEL homepage="https://khoj.dev" | ||
LABEL repository="https://github.com/khoj-ai/khoj" | ||
LABEL org.opencontainers.image.source="https://github.com/khoj-ai/khoj" | ||
LABEL org.opencontainers.image.description="Your second brain, containerized for personal, local deployment." | ||
|
||
# Install System Dependencies | ||
RUN apt update -y && apt -y install \ | ||
python3-pip \ | ||
tzdata \ | ||
swig \ | ||
curl \ | ||
# Required by RapidOCR | ||
libgl1 \ | ||
libglx-mesa0 \ | ||
libglib2.0-0 \ | ||
# Required by llama-cpp-python pre-built wheels. See #1628 | ||
musl-dev && \ | ||
ln -s /usr/lib/x86_64-linux-musl/libc.so /lib/libc.musl-x86_64.so.1 && \ | ||
# Clean up | ||
apt clean && rm -rf /var/lib/apt/lists/* | ||
|
||
# Build Server | ||
FROM base AS server-deps | ||
WORKDIR /app | ||
COPY pyproject.toml . | ||
COPY README.md . | ||
ARG VERSION=0.0.0 | ||
|
||
# Set environment variables for Habana | ||
ENV HABANA_VISIBLE_DEVICES=all | ||
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none | ||
ENV PT_HPU_LAZY_ACC_PAR_MODE=0 | ||
ENV PT_HPU_ENABLE_LAZY_COLLECTIVES=1 | ||
Comment on lines
+33
to
+35
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These seem like optional runtime variables to configure habana support based on the Habana docs? If these runtime env vars are the only change this Dockerfile.hpu adds, we can drop this |
||
|
||
|
||
# use the pre-built llama-cpp-python, torch cpu wheel | ||
ENV PIP_EXTRA_INDEX_URL="https://abetlen.github.io/llama-cpp-python/whl/cpu" | ||
# avoid downloading unused cuda specific python packages | ||
ENV CUDA_VISIBLE_DEVICES="" | ||
RUN sed -i "s/dynamic = \\[\"version\"\\]/version = \"$VERSION\"/" pyproject.toml && \ | ||
pip install --no-cache-dir . | ||
|
||
# Build Web App | ||
FROM node:20-alpine AS web-app | ||
# Set build optimization env vars | ||
ENV NODE_ENV=production | ||
ENV NEXT_TELEMETRY_DISABLED=1 | ||
WORKDIR /app/src/interface/web | ||
# Install dependencies first (cache layer) | ||
COPY src/interface/web/package.json src/interface/web/yarn.lock ./ | ||
RUN yarn install --frozen-lockfile | ||
# Copy source and build | ||
COPY src/interface/web/. ./ | ||
RUN yarn build | ||
|
||
# Merge the Server and Web App into a Single Image | ||
FROM base | ||
ENV PYTHONPATH=/app/src | ||
WORKDIR /app | ||
COPY --from=server-deps /usr/local/lib/python3.10/dist-packages /usr/local/lib/python3.10/dist-packages | ||
COPY --from=web-app /app/src/interface/web/out ./src/khoj/interface/built | ||
COPY . . | ||
RUN cd src && python3 khoj/manage.py collectstatic --noinput | ||
|
||
# Run the Application | ||
# There are more arguments required for the application to run, | ||
# but those should be passed in through the docker-compose.yml file. | ||
ARG PORT | ||
EXPOSE ${PORT} | ||
ENTRYPOINT ["python3", "src/khoj/main.py"] |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updates to this should be instead moved to a new HPU tab under the Khoj setup docs at |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -92,3 +92,51 @@ While we're using Django for the ORM, we're still using the FastAPI server for t | |
```bash | ||
python3 src/khoj/main.py --anonymous-mode | ||
``` | ||
|
||
|
||
## 🚀 HPU Support | ||
### 🛠️ Setup for HPU | ||
|
||
To run Khoj on a Habana Gaudi device, follow these steps: | ||
|
||
1. **Build the HPU Docker Image**: | ||
Use the provided `Dockerfile.hpu` to build a Docker image optimized for HPU: | ||
```bash | ||
docker build -t khoj-hpu -f Dockerfile.hpu . | ||
``` | ||
|
||
Comment on lines
+102
to
+107
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This may not be required if the previous comments on the Dockerfile.hpu are valid. Folks can just use the default Khoj dockerfile or image |
||
2. **Run the Docker Container**: | ||
Start the container with the appropriate environment variables for HPU: | ||
```bash | ||
docker run --runtime=habana -e HABANA_VISIBLE_DEVICES=all -p <PORT>:<PORT> khoj-hpu | ||
``` | ||
Replace `<PORT>` with the port number you want to expose. | ||
|
||
3. **Verify HPU Support**: | ||
Ensure that the application detects the HPU device by checking the logs. The application will automatically use the HPU if available. | ||
|
||
### 📦 New Dependencies | ||
|
||
To support HPU and other advanced features, we've added the following dependencies: | ||
|
||
- **`optimum-habana`**: Optimizes models for Habana Gaudi accelerators. | ||
- **`torch-geometric`**: Enables deep learning on graph-based data structures. | ||
- **`numba`**: Accelerates Python code by compiling it to machine code at runtime. | ||
Comment on lines
+122
to
+124
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like the only dependency explicitly added is |
||
|
||
These dependencies are automatically installed when you build the Docker image or install the project locally. | ||
|
||
### 🧠 Device Selection | ||
|
||
The application now supports multiple device types, including **CUDA**, **HPU**, **MPS** (Apple Silicon), and **CPU**. You can specify your preferred device by passing the `preferred_device` argument to the `get_device()` function in `helpers.py`. For example: | ||
|
||
```python | ||
device = get_device(preferred_device="hpu") # Use HPU if available | ||
``` | ||
|
||
If no preferred device is specified, the application will automatically select the best available device. | ||
|
||
Comment on lines
+128
to
+137
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't used and is relevant to Khoj users or deployers. So should be removed from the documention |
||
### 📝 Notes | ||
|
||
- Ensure that your system has the necessary Habana drivers and software stack installed to use HPUs. | ||
- For more information on Habana Gaudi accelerators, visit the [Habana Labs documentation](https://docs.habana.ai/). | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
|
||
import copy | ||
import datetime | ||
import importlib | ||
import io | ||
import ipaddress | ||
import logging | ||
|
@@ -301,6 +302,8 @@ def log_telemetry( | |
def get_device_memory() -> int: | ||
"""Get device memory in GB""" | ||
device = get_device() | ||
if device.type == "hpu": | ||
return torch.hpu.get_device_properties(device).total_memory | ||
if device.type == "cuda": | ||
return torch.cuda.get_device_properties(device).total_memory | ||
elif device.type == "mps": | ||
|
@@ -309,15 +312,32 @@ def get_device_memory() -> int: | |
return psutil.virtual_memory().total | ||
|
||
|
||
def get_device() -> torch.device: | ||
"""Get device to run model on""" | ||
def get_device(preferred_device=None) -> torch.device: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
""" | ||
Determine the appropriate device to use (cuda, hpu, or cpu). | ||
Args: | ||
preferred_device (str): User-preferred device ('cuda', 'hpu', or 'cpu'). | ||
Returns: | ||
torch.device: 'cuda', 'hpu', 'mps' or 'cpu'. | ||
""" | ||
# Check for HPU support | ||
if importlib.util.find_spec("habana_frameworks") is not None: | ||
from habana_frameworks.torch.utils.library_loader import load_habana_module | ||
|
||
load_habana_module() | ||
if torch.hpu.is_available(): | ||
if preferred_device is None or "hpu" in preferred_device: | ||
return torch.device("hpu") | ||
# Use CUDA GPU if available | ||
if torch.cuda.is_available(): | ||
# Use CUDA GPU | ||
return torch.device("cuda:0") | ||
if preferred_device is None or "cuda" in preferred_device: | ||
return torch.device("cuda:0") | ||
# Use Apple M1 Metal Acceleration if available | ||
elif torch.backends.mps.is_available(): | ||
# Use Apple M1 Metal Acceleration | ||
return torch.device("mps") | ||
if preferred_device is None or "mps" in preferred_device: | ||
return torch.device("mps") | ||
else: | ||
# Default to CPU | ||
return torch.device("cpu") | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seems to be a required runtime environment variable to enable habana hpu? If so, it just be mentioned in the Khoj setup docs under the HPU tab. See
/documentation/docs/get-started/setup.mdx
for reference