Skip to content

Add an optimization doc on TPU #21155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

Conversation

bvrockwell
Copy link
Contributor

@bvrockwell bvrockwell commented Jul 18, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

@bvrockwell bvrockwell requested a review from hmellor as a code owner July 18, 2025 00:52
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation tpu Related to Google TPUs labels Jul 18, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new documentation page with optimization tips for using vLLM on TPUs. The content is valuable and well-structured. I've identified a few issues, mainly related to broken or fragile links and a missing image, which could impact the user experience. I've also found a typo in a command-line argument that could cause issues for users who copy-paste it. Please see my detailed comments.


### **Get started**

Looking for setup and installation instructions? Find them [here](https://vllm--19708.org.readthedocs.build/en/19708/getting_started/installation/google_tpu.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The link to the setup and installation instructions points to a temporary ReadTheDocs build for this PR. This link will become invalid after the PR is merged. Please update it to a permanent link, likely pointing to the official vLLM documentation.

Suggested change
Looking for setup and installation instructions? Find them [here](https://vllm--19708.org.readthedocs.build/en/19708/getting_started/installation/google_tpu.html).
Looking for setup and installation instructions? Find them [here](https://docs.vllm.ai/en/latest/getting_started/installation/google_tpu.html).


When selecting the ideal number of chips for a single serving instance, it's important to account for both the model size and the average request context length. Adequate HBM for the KV cache is essential to ensure a sufficient number of concurrent requests can be processed.

The following colab [calculator](https://colab.sandbox.google.com/drive/1M_f3xZm-_Ce2D-UMAyGNyacEIN-6rUbf) will tell you:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The link to the Colab calculator uses a sandbox.google.com domain, which may not be accessible to all users. Please use a public, shareable link (e.g., from colab.research.google.com) to ensure everyone can access the calculator.


#### *max model len vs. most model len*

![image](most_model_len.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The image most_model_len.png is referenced here, but it doesn't seem to be included in this pull request. This will result in a broken image in the documentation. Please add the image file to the repository, for example in the same directory as this README.


### **Tune your workloads!**

Although we try to have great default configs, we strongly recommend you check out the [vLLM auto-tuner](https://github.com/vllm-project/vllm/pull/20779/files?short_path=f9b273a#diff-f9b273a10e0688ba63c38bd93a2e64ceb54d4fdd7ff7b82d347df06d0d34e39c) to optimize your workloads for your use case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Linking to a pull request is not ideal for documentation as PRs are not permanent resources in the same way that documentation pages are. The link is also very specific and might break easily. It would be better to link to the official documentation for the auto-tuner feature once it's available, or to a more stable resource.

@bvrockwell bvrockwell marked this pull request as draft July 18, 2025 00:53
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation tpu Related to Google TPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant