Add an optimization doc on TPU #21155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

bvrockwell wants to merge 10 commits into vllm-project:main from bvrockwell:main

+62 −0

Contributor

bvrockwell commented Jul 18, 2025 •

edited by github-actions bot

Loading

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

bvrockwell and others added 9 commits

July 15, 2025 14:29


          draft

336ce6f


          rename

ecd6308


          add tuner

de7986e


          add information about calculator

19164c8


          updating the docs

a0eedfa


          Merge branch 'vllm-project:main' into main

0585f3b


          update

62e520e


          add link to more docs

caf6377


          Merge branch 'vllm-project:main' into main

9d4a924

bvrockwell requested a review from hmellor as a code owner

July 18, 2025 00:52

github-actions bot commented Jul 18, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify bot added documentation tpu labels

gemini-code-assist bot reviewed

View reviewed changes

Contributor

gemini-code-assist bot left a comment

Code Review

This pull request adds a new documentation page with optimization tips for using vLLM on TPUs. The content is valuable and well-structured. I've identified a few issues, mainly related to broken or fragile links and a missing image, which could impact the user experience. I've also found a typo in a command-line argument that could cause issues for users who copy-paste it. Please see my detailed comments.

docs/configuration/tpu/README.md


		### Get started

		Looking for setup and installation instructions? Find them [here](https://vllm--19708.org.readthedocs.build/en/19708/getting_started/installation/google_tpu.html).

Contributor

gemini-code-assist bot Jul 18, 2025

The link to the setup and installation instructions points to a temporary ReadTheDocs build for this PR. This link will become invalid after the PR is merged. Please update it to a permanent link, likely pointing to the official vLLM documentation.

Suggested change

      
            Looking for setup and installation instructions? Find them [here](https://vllm--19708.org.readthedocs.build/en/19708/getting_started/installation/google_tpu.html).
          
            Looking for setup and installation instructions? Find them [here](https://docs.vllm.ai/en/latest/getting_started/installation/google_tpu.html).

docs/configuration/tpu/README.md


		When selecting the ideal number of chips for a single serving instance, it's important to account for both the model size and the average request context length. Adequate HBM for the KV cache is essential to ensure a sufficient number of concurrent requests can be processed.

		The following colab [calculator](https://colab.sandbox.google.com/drive/1M_f3xZm-_Ce2D-UMAyGNyacEIN-6rUbf) will tell you:

Contributor

gemini-code-assist bot Jul 18, 2025

The link to the Colab calculator uses a sandbox.google.com domain, which may not be accessible to all users. Please use a public, shareable link (e.g., from colab.research.google.com) to ensure everyone can access the calculator.

docs/configuration/tpu/README.md Outdated Show resolved Hide resolved

docs/configuration/tpu/README.md


		#### max model len vs. most model len

		![image](most_model_len.png)

Contributor

gemini-code-assist bot Jul 18, 2025

The image most_model_len.png is referenced here, but it doesn't seem to be included in this pull request. This will result in a broken image in the documentation. Please add the image file to the repository, for example in the same directory as this README.

docs/configuration/tpu/README.md


		### Tune your workloads!

		Although we try to have great default configs, we strongly recommend you check out the [vLLM auto-tuner](https://github.com/vllm-project/vllm/pull/20779/files?short_path=f9b273a#diff-f9b273a10e0688ba63c38bd93a2e64ceb54d4fdd7ff7b82d347df06d0d34e39c) to optimize your workloads for your use case.

Contributor

gemini-code-assist bot Jul 18, 2025

Linking to a pull request is not ideal for documentation as PRs are not permanent resources in the same way that documentation pages are. The link is also very specific and might break easily. It would be better to link to the official documentation for the auto-tuner feature once it's available, or to a more stable resource.

bvrockwell marked this pull request as draft

July 18, 2025 00:53


          Update docs/configuration/tpu/README.md

a764418

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation tpu