-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
[WIP] backed_Size_oblivious + pytorch 2.8 #20719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,7 +50,7 @@ ARG UV_INDEX_URL=${PIP_INDEX_URL} | |
ARG UV_EXTRA_INDEX_URL=${PIP_EXTRA_INDEX_URL} | ||
|
||
# PyTorch provides its own indexes for standard and nightly builds | ||
ARG PYTORCH_CUDA_INDEX_BASE_URL=https://download.pytorch.org/whl | ||
ARG PYTORCH_CUDA_INDEX_BASE_URL=https://download.pytorch.org/whl/test | ||
ARG PYTORCH_CUDA_NIGHTLY_INDEX_BASE_URL=https://download.pytorch.org/whl/nightly | ||
|
||
# PIP supports multiple authentication schemes, including keyring | ||
|
@@ -363,6 +363,16 @@ RUN --mount=type=bind,from=build,src=/workspace/dist,target=/vllm-workspace/dist | |
uv pip install --system dist/*.whl --verbose \ | ||
--extra-index-url ${PYTORCH_CUDA_INDEX_BASE_URL}/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.') | ||
|
||
# TODO (huydhn): Remove this once xformers is released for 2.8.0 | ||
# https://pytorch.s3.us-east-1.amazonaws.com/whl/test/cu128/xformers/xformers-0.0.30%2B4cf69f09.d20250708-cp312-cp312-linux_x86_64.whl | ||
RUN --mount=type=cache,target=/root/.cache/uv bash - <<'BASH' | ||
. /etc/environment | ||
export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0 10.0 12.0' | ||
uv pip install --system --no-build-isolation "git+https://github.com/facebookresearch/[email protected]" | ||
# DEBUG | ||
python3 -m xformers.info | ||
BASH | ||
|
||
# If we need to build FlashInfer wheel before its release: | ||
# $ # Note we remove 7.0 from the arch list compared to the list below, since FlashInfer only supports sm75+ | ||
# $ export TORCH_CUDA_ARCH_LIST='7.5 8.0 8.9 9.0a 10.0a 12.0' | ||
|
@@ -376,8 +386,8 @@ RUN --mount=type=bind,from=build,src=/workspace/dist,target=/vllm-workspace/dist | |
# $ # upload the wheel to a public location, e.g. https://wheels.vllm.ai/flashinfer/v0.2.6.post1/flashinfer_python-0.2.6.post1-cp39-abi3-linux_x86_64.whl | ||
|
||
# Allow specifying a version, Git revision or local .whl file | ||
ARG FLASHINFER_CUDA128_INDEX_URL="https://download.pytorch.org/whl/cu128/flashinfer" | ||
ARG FLASHINFER_CUDA128_WHEEL="flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl" | ||
ARG FLASHINFER_CUDA128_INDEX_URL="https://download.pytorch.org/whl/test/cu128/flashinfer" | ||
ARG FLASHINFER_CUDA128_WHEEL="flashinfer_python-0.2.6.post1%2Bcu128torch2.8-cp39-abi3-linux_x86_64.whl" | ||
ARG FLASHINFER_GIT_REPO="https://github.com/flashinfer-ai/flashinfer.git" | ||
ARG FLASHINFER_GIT_REF="v0.2.6.post1" | ||
RUN --mount=type=cache,target=/root/.cache/uv bash - <<'BASH' | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,9 +6,10 @@ numba == 0.61.2; python_version > '3.9' | |
|
||
# Dependencies for NVIDIA GPUs | ||
ray[cgraph]>=2.43.0, !=2.44.* # Ray Compiled Graph, required for pipeline parallelism in V1. | ||
torch==2.7.0 | ||
torchaudio==2.7.0 | ||
torch==2.8.0 | ||
torchaudio==2.8.0 | ||
# These must be updated alongside torch | ||
torchvision==0.22.0 # Required for phi3v processor. See https://github.com/pytorch/vision?tab=readme-ov-file#installation for corresponding version | ||
torchvision==0.23.0 # Required for phi3v processor. See https://github.com/pytorch/vision?tab=readme-ov-file#installation for corresponding version | ||
# TODO (huydhn): Re-enable this once xformers is released for 2.8.0 | ||
# https://github.com/facebookresearch/xformers/releases/tag/v0.0.30 | ||
xformers==0.0.30; platform_system == 'Linux' and platform_machine == 'x86_64' # Requires PyTorch >= 2.7 | ||
# git+https://github.com/facebookresearch/xformers@v0.0.30; platform_system == 'Linux' and platform_machine == 'x86_64' # Requires PyTorch >= 2.7 | ||
Comment on lines
+13
to
+15
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,9 @@ apt autoremove -y | |
|
||
echo 'import os; os.system("touch /tmp/changed.file")' >> vllm/__init__.py | ||
|
||
VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 VLLM_USE_PRECOMPILED=1 pip3 install -vvv -e . | ||
# TESTING, TO BE REMOVED | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 VLLM_USE_PRECOMPILED=1 pip3 install -vvv -e . \ | ||
--extra-index-url https://download.pytorch.org/whl/test/cu128 | ||
|
||
# Run the script | ||
python3 -c 'import vllm' | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1216,10 +1216,11 @@ def load_model(self) -> None: | |
CompilationLevel.DYNAMO_AS_IS and supports_dynamo(): | ||
backend = self.vllm_config.compilation_config.init_backend( | ||
self.vllm_config) | ||
self.model = torch.compile( | ||
self.model, | ||
fullgraph=envs.VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE, | ||
backend=backend) | ||
with torch.fx.experimental._config.patch(backed_size_oblivious=True): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how can we localize this to lora? |
||
self.model = torch.compile( | ||
self.model, | ||
fullgraph=envs.VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE, | ||
backend=backend) | ||
|
||
def get_model(self) -> nn.Module: | ||
return self.model | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This debugging line should be removed before the pull request is merged. It's good for development, but shouldn't be in the final Docker image.