-
-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[ROCm] Effort to reduce the number of environment variables in command line #17229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Hongxia Yang <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Can we add a test plan? |
@@ -114,6 +114,12 @@ COPY --from=export_vllm /examples ${COMMON_WORKDIR}/vllm/examples | |||
ENV RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1 | |||
ENV TOKENIZERS_PARALLELISM=false | |||
|
|||
# ENV that can improve safe tensor loading, and end-to-end time | |||
ENV SAFETENSORS_FAST_GPU=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENV that can improve safe tensor loading,
I didn't find this variable in the vllm repository. Could you remind me why it can improve loading time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the safe tensor, see https://huggingface.co/docs/safetensors/en/speed for more details.
docker/Dockerfile.rocm
Outdated
# ENV that can improve safe tensor loading, and end-to-end time | ||
ENV SAFETENSORS_FAST_GPU=1 | ||
# ENV that needed for multi-process on cuda-like platform | ||
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add comment to elaborate why spawn is needed here? Is it due to some compatibility issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add comment to elaborate why spawn is needed here? Is it due to some compatibility issue?
There was time this was fixed for ROCm and the fix is to force using spawn when on ROCm platform. See below issue:
#7791
However, during llama4 enablement in March, I found the issue was back when running simple scripts, and we had set the env since then in our scripts.
So to make it safe and stable for all situations, I think it may be a user-friendly thing to put it in the docker file.
Right now, the default is set to "fork" in envs.py,
# Use dedicated multiprocess context for workers.
# Both spawn and fork work
"VLLM_WORKER_MULTIPROC_METHOD":
lambda: os.getenv("VLLM_WORKER_MULTIPROC_METHOD", "fork"),
But, if we search the code, there are many places to force setting "VLLM_WORKER_MULTIPROC_METHOD=spawn".
for example:
if reason is not None:
logger.warning(
"We must use the `spawn` multiprocessing start method. "
"Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. "
"See https://docs.vllm.ai/en/latest/getting_started/"
"troubleshooting.html#python-multiprocessing "
"for more information. Reason: %s", reason)
os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add comment to elaborate why spawn is needed here? Is it due to some compatibility issue?
@houseroad Done
Added the test plan in the description of the pull request |
Signed-off-by: Hongxia Yang <[email protected]>
Signed-off-by: Hongxia Yang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
* Revert "[Misc] Add S3 environment variables for better support of MinIO." (vllm-project#17021) * [misc] tune some env vars for GB200 (vllm-project#16992) Signed-off-by: youkaichao <[email protected]> * [INTEL-HPU][v0] Port delayed sampling to upstream (vllm-project#16949) Signed-off-by: Michal Adamczyk <[email protected]> Signed-off-by: Chendi Xue <[email protected]> Co-authored-by: Michal Adamczyk <[email protected]> * [doc] add download path tips (vllm-project#17013) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix] Triton FA function takes no keyword arguments (vllm-project#16902) Signed-off-by: vllmellm <[email protected]> * [V1] Avoid socket errors during shutdown when requests are in in-flight (vllm-project#16807) Signed-off-by: Nick Hill <[email protected]> * [BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) (vllm-project#16998) Signed-off-by: Lucas Wilkinson <[email protected]> * [Misc] Improve readability of get_open_port function. (vllm-project#17024) Signed-off-by: gitover22 <[email protected]> * [Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers (vllm-project#16964) Signed-off-by: chaunceyjiang <[email protected]> * [CI] Run v1/test_serial_utils.py in CI (vllm-project#16996) Signed-off-by: Russell Bryant <[email protected]> * Mistral-format support for compressed-tensors (vllm-project#16803) Signed-off-by: mgoin <[email protected]> * Categorize `tests/kernels/` based on kernel type (vllm-project#16799) Signed-off-by: mgoin <[email protected]> * [Doc] Add top anchor and a note to quantization/bitblas.md (vllm-project#17042) Signed-off-by: windsonsea <[email protected]> * Ensure that `pid` passed to `kill_process_tree` is `int` for `mypy` (vllm-project#17051) Signed-off-by: Harry Mellor <[email protected]> * [CI] Update structured-output label automation (vllm-project#17055) Signed-off-by: Russell Bryant <[email protected]> * Improve Transformers backend model loading QoL (vllm-project#17039) Signed-off-by: Harry Mellor <[email protected]> * `CacheConfig.block_size` should always be `int` when used (vllm-project#17052) Signed-off-by: Harry Mellor <[email protected]> * Use `@property` and private field for `data_parallel_rank_local` (vllm-project#17053) Signed-off-by: Harry Mellor <[email protected]> * [Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (vllm-project#15949) Signed-off-by: Travis Johnson <[email protected]> * [BugFix][V1] Fix int32 token index overflow when preparing input ids (vllm-project#16806) * [V1][Spec Decode] Always use argmax for sampling draft tokens (vllm-project#16899) Signed-off-by: Woosuk Kwon <[email protected]> * [CI/Build] workaround for CI build failure (vllm-project#17070) Signed-off-by: csy1204 <[email protected]> Co-authored-by: Michael Goin <[email protected]> * [Quantization]add prefix for commandA quantized model (vllm-project#17017) * [Minor] Use larger batch sizes for A100/B100/B200/MI300x (vllm-project#17073) Signed-off-by: Woosuk Kwon <[email protected]> * [Bugfix] Enable V1 usage stats (vllm-project#16986) Signed-off-by: mgoin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Nick Hill <[email protected]> * More informative error when using Transformers backend (vllm-project#16988) Signed-off-by: Harry Mellor <[email protected]> * Addendum Fix to support FIPS enabled machines with MD5 hashing (vllm-project#17043) Signed-off-by: sydarb <[email protected]> * [Bugfix][Core] add seq_id_to_seq_group clearing to avoid memory leak when s… (vllm-project#16472) Signed-off-by: 开哲 <[email protected]> Co-authored-by: 开哲 <[email protected]> * [V1] Update structured output (vllm-project#16812) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [doc] update to hyperlink (vllm-project#17096) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * Add docs for runai_streamer_sharded (vllm-project#17093) Signed-off-by: Omer Dayan (SW-GPU) <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Chore] Remove Sampler from Model Code (vllm-project#17084) Signed-off-by: Woosuk Kwon <[email protected]> * Disable enforce_eager for V1 TPU sampler and structured output tests (vllm-project#17016) Signed-off-by: mgoin <[email protected]> * Simplify `TokenizerGroup` (vllm-project#16790) Signed-off-by: Harry Mellor <[email protected]> * Fix OOT registration test (vllm-project#17099) Signed-off-by: Harry Mellor <[email protected]> * [V1][PP] Optimization: continue scheduling prefill chunks (vllm-project#17080) Signed-off-by: Rui Qiao <[email protected]> * [Misc] Remove OLMo2 config copy (vllm-project#17066) Signed-off-by: Isotr0py <[email protected]> * Improve static type checking in `LoRAModelRunnerMixin` (vllm-project#17104) Signed-off-by: Harry Mellor <[email protected]> * [V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (vllm-project#16954) Signed-off-by: shen-shanshan <[email protected]> * [Frontend] Using matryoshka_dimensions control the allowed output dimensions. (vllm-project#16970) * Add missing rocm_skinny_gemms kernel test to CI (vllm-project#17060) Signed-off-by: mgoin <[email protected]> * [Misc] refactor example series - structured outputs (vllm-project#17040) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (vllm-project#16665) Signed-off-by: Mark McLoughlin <[email protected]> * [CI] Add automation for the `tool-calling` github label (vllm-project#17118) Signed-off-by: Russell Bryant <[email protected]> * Updating builkite job for IBM Power (vllm-project#17111) Signed-off-by: Aaruni Aggarwal <[email protected]> * existing torch installation pip command fix for docs (vllm-project#17059) * Molmo Requirements (vllm-project#17026) Signed-off-by: Eyshika Agarwal <[email protected]> Signed-off-by: eyshika <[email protected]> * Add `:markdownhelp:` to `EngineArgs` docs so markdown docstrings render properly (vllm-project#17124) Signed-off-by: Harry Mellor <[email protected]> * Improve configs - `LoRAConfig` + `PromptAdapterConfig` (vllm-project#16980) Signed-off-by: Harry Mellor <[email protected]> * [Docs] Generate correct github links for decorated functions (vllm-project#17125) Signed-off-by: Russell Bryant <[email protected]> * Add collective_rpc to llm engine (vllm-project#16999) Signed-off-by: Yinghai Lu <[email protected]> * Add chat template for Llama 4 models (vllm-project#16428) Signed-off-by: Max de Bayser <[email protected]> * [Misc] Add example to run DeepSeek with Ray Serve LLM (vllm-project#17134) Signed-off-by: Rui Qiao <[email protected]> * Better error message for missing mistral params.json (vllm-project#17132) Signed-off-by: mgoin <[email protected]> * Use custom address for listening socket (vllm-project#15988) Signed-off-by: Jens Glaser <[email protected]> * [FEAT] [ROCm]: AITER Fused MOE V1 Support (vllm-project#16752) Signed-off-by: vllmellm <[email protected]> Co-authored-by: tjtanaa <[email protected]> * [Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (vllm-project#16864) Signed-off-by: Lucas Wilkinson <[email protected]> * fix float16 support for kimi-vl (vllm-project#17156) Co-authored-by: zhouzaida <[email protected]> * [Doc] V1 : Update LoRA status (vllm-project#17133) Signed-off-by: varun sundar rabindranath <[email protected]> Co-authored-by: varun sundar rabindranath <[email protected]> * [Docs] Fix True->true in supported_models.md (vllm-project#17141) * Move missed `SchedulerConfig` args into scheduler config group in `EngineArgs` (vllm-project#17131) Signed-off-by: Harry Mellor <[email protected]> * [Misc] Clean up redundant code in uniproc_executor.py (vllm-project#16762) Signed-off-by: Lifu Huang <[email protected]> * [Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (vllm-project#15099) Signed-off-by: Mengqing Cao <[email protected]> * [Misc] Benchmark Serving Script Support Appending Results (vllm-project#17028) Signed-off-by: Lucas Wilkinson <[email protected]> * [Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (vllm-project#16457) Signed-off-by: cynthieye <[email protected]> Co-authored-by: MagnetoWang <[email protected]> * [Bugfix] remove fallback in guided_json (int range, patterns) (vllm-project#16725) Signed-off-by: csy1204 <[email protected]> Co-authored-by: 조상연[플레이스 AI] <[email protected]> * [Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (vllm-project#15734) Signed-off-by: Randall Smith <[email protected]> Signed-off-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]> * [Doc] Add headings to improve gptqmodel.md (vllm-project#17164) Signed-off-by: windsonsea <[email protected]> * Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (vllm-project#17158) * [Doc] Add two links to disagg_prefill.md (vllm-project#17168) Signed-off-by: windsonsea <[email protected]> * [Doc] Move todo out of beam search docstring (vllm-project#17183) Signed-off-by: Alex-Brooks <[email protected]> * [Bugfix] Fix mistral model tests (vllm-project#17181) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Fix Mistral ChatCompletionRequest Body Exception (vllm-project#16769) Signed-off-by: Jasmond Loh <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * Bump Transformers to 4.51.3 (vllm-project#17116) Signed-off-by: Harry Mellor <[email protected]> * Use Transformers helper `get_text_config()` instead of checking for `text_config` (vllm-project#17105) Signed-off-by: Harry Mellor <[email protected]> * [doc] update wrong hf model links (vllm-project#17184) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Inline Molmo requirements (vllm-project#17190) Signed-off-by: DarkLight1337 <[email protected]> * [Security] Use safe serialization and fix zmq setup for mooncake pipe (vllm-project#17192) Signed-off-by: Shangming Cai <[email protected]> Co-authored-by: Shangming Cai <[email protected]> * [V1] Move usage stats to worker and start logging TPU hardware (vllm-project#16211) * [Bugfix] Fix hybrid model tests (vllm-project#17182) Signed-off-by: DarkLight1337 <[email protected]> * Fix Python packaging edge cases (vllm-project#17159) Signed-off-by: Christian Heimes <[email protected]> * [BugFix][Frontend] Fix `LLM.chat()` tokenization (vllm-project#16081) Signed-off-by: Nick Hill <[email protected]> * [V1][Spec Decode] EAGLE-3 Support (vllm-project#16937) Signed-off-by: Bryan Lu <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> Co-authored-by: Bryan Lu <[email protected]> * [Misc] Refine ray_serve_deepseek example (vllm-project#17204) Signed-off-by: Rui Qiao <[email protected]> * [Bugfix] gemma[2,3] interleaved attention when sliding window is disabled (vllm-project#17180) Signed-off-by: Chen Zhang <[email protected]> * [AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary (vllm-project#17215) Signed-off-by: Randall Smith <[email protected]> * [v1] [P/D] Adding LMCache KV connector for v1 (vllm-project#16625) * [Bugfix] [pytorch] Patch AOTAutogradCache._get_shape_env (vllm-project#17142) Signed-off-by: James Wu <[email protected]> * [MISC][AMD] Add unused annotation to rocm kernel file (vllm-project#17097) Signed-off-by: Lu Fang <[email protected]> * [doc] add Anything LLM integration (vllm-project#17216) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Minor][Spec Decode] Add use_eagle to SpeculativeConfig (vllm-project#17213) Signed-off-by: Woosuk Kwon <[email protected]> * [Doc] Minor fix for the vLLM TPU setup page (vllm-project#17206) Signed-off-by: Yarong Mu <[email protected]> * [Minor][Models] Fix Return Types of Llama & Eagle (vllm-project#17220) Signed-off-by: Woosuk Kwon <[email protected]> * Allocate kv_cache with stride order (vllm-project#16605) Signed-off-by: shuw <[email protected]> * [ROCm][Misc] Follow-ups for Skinny Gemms on ROCm. (vllm-project#17011) Signed-off-by: charlifu <[email protected]> * [V1][Metrics] Allow V1 AsyncLLM to use custom logger (vllm-project#14661) Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Nick Hill <[email protected]> * [BugFix] Avoid race conditions in zero-copy tensor transmission (vllm-project#17203) Signed-off-by: Nick Hill <[email protected]> * [CI/test] Fix Eagle Correctness Test (vllm-project#17209) Signed-off-by: Woosuk Kwon <[email protected]> * [Core] Remove prompt string from engine core data structures (vllm-project#17214) Signed-off-by: Nick Hill <[email protected]> * [Bugfix] Fix missing int type for `-n` in multi-image example (vllm-project#17223) * [Bugfix] Fix standard models tests (vllm-project#17217) Signed-off-by: DarkLight1337 <[email protected]> * [Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device (vllm-project#17186) Signed-off-by: Agata Dobrzyniewicz <[email protected]> * [V1] Add `structural_tag` support using xgrammar (vllm-project#17085) * [BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set (vllm-project#17088) Signed-off-by: Andy Xie <[email protected]> * [Chore] added stubs for `vllm_flash_attn` during development mode (vllm-project#17228) Signed-off-by: Aaron Pham <[email protected]> * [Docs] Update structured output doc for V1 (vllm-project#17135) Signed-off-by: Russell Bryant <[email protected]> * [Bugfix] fix error due to an uninitialized tokenizer when using `skip_tokenizer_init` with `num_scheduler_steps` (vllm-project#9276) Signed-off-by: changjun.lee <[email protected]> * Disable the torch.compile cache checks when VLLM_DISABLE_COMPILE_CACHE=1 (vllm-project#16573) Signed-off-by: Lu Fang <[email protected]> * [MISC] rename interval to max_recent_requests (vllm-project#14285) * [Bugfix] Fix Qwen2.5-Omni M-RoPE position ids generation (vllm-project#16878) Signed-off-by: imkero <[email protected]> * [Minor] Fix lint error in main branch (vllm-project#17233) Signed-off-by: Woosuk Kwon <[email protected]> * [CI/Build] remove -t for run-lm-eval-gsm-hf-baseline.sh (vllm-project#16271) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * Update test_flash_attn.py (vllm-project#17102) Signed-off-by: ShuaibinLi <[email protected]> * [Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (vllm-project#12591) Signed-off-by: Randall Smith <[email protected]> * [Misc] Make cached tokenizer pickle-compatible (vllm-project#17048) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Fix QWen2 VL multimodal mapping (vllm-project#17240) Signed-off-by: Jee Jee Li <[email protected]> * [Bugfix] Get a specific type of layer from forward context (vllm-project#17222) Signed-off-by: Chen Zhang <[email protected]> * [MISC] Use string annotation types for class definitions (vllm-project#17244) Signed-off-by: Jade Zheng <[email protected]> * [Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (vllm-project#17033) Signed-off-by: sfc-gh-zhwang <[email protected]> * [Bugfix] Fix Lora Name Parsing (vllm-project#17196) Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> * [NVIDIA] Support Cutlass MLA for Blackwell GPUs (vllm-project#16032) Signed-off-by: kaixih <[email protected]> * [Feature] support sequence parallelism using compilation pass (vllm-project#16155) Signed-off-by: cascade812 <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> * [doc] Add feature status legend (vllm-project#17257) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Metrics] Fix minor inconsistencies in bucket progression (vllm-project#17262) Signed-off-by: DarkLight1337 <[email protected]> * [V1][Spec Decode] Make eagle compatible with prefix caching. (vllm-project#17137) Signed-off-by: LiuXiaoxuanPKU <[email protected]> * [BugFix] Fix vllm_flash_attn install issues (vllm-project#17267) Signed-off-by: Lucas Wilkinson <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Aaron Pham <[email protected]> * [Bugfix] Fix missing ARG in Dockerfile for arm64 platforms (vllm-project#17261) Signed-off-by: lkm-schulz <[email protected]> * [Bugfix] Fix cutlass dispatch for fp8/int8 to properly invoke M<=16 c… (vllm-project#16751) Signed-off-by: Ther-LF <[email protected]> * [Bugfix] Fix Mistral3 spatial merge error (vllm-project#17270) Signed-off-by: mgoin <[email protected]> * [Doc] Fix wrong github link in LMCache examples (vllm-project#17274) Signed-off-by: KuntaiDu <[email protected]> * [Doc] small fix (vllm-project#17277) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Validate `stop_token_ids` contents (vllm-project#17268) Signed-off-by: Nick Hill <[email protected]> * [Minor][Models] Pass partial_rotary_factor parameter to rope (vllm-project#17266) Signed-off-by: evian <[email protected]> Co-authored-by: evian <[email protected]> * [Core] Remove legacy input mapper/processor from V0 (vllm-project#15686) Signed-off-by: DarkLight1337 <[email protected]> * [Model] Add Granite Speech Support (vllm-project#16246) Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> * Update tpu_worker.py 's typo (vllm-project#17288) * Add missing class docstring for `PromptAdapterConfig` (vllm-project#17302) Signed-off-by: Harry Mellor <[email protected]> * [Bugfix] Add missing `get_language_model` to new MLLMs (vllm-project#17300) Signed-off-by: DarkLight1337 <[email protected]> * [doc] update wrong model id (vllm-project#17287) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Minor typo/grammar in `platforms/interface.py` (vllm-project#17307) Signed-off-by: NickLucche <[email protected]> * [Misc] Clean up Qwen2.5-Omni code (vllm-project#17301) Signed-off-by: DarkLight1337 <[email protected]> * [Docs] Add a security guide (vllm-project#17230) Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * Improve conversion from dataclass configs to argparse arguments (vllm-project#17303) Signed-off-by: Harry Mellor <[email protected]> * Make name of `compressed-tensors` quant method consistent across vLLM (vllm-project#17255) Signed-off-by: Harry Mellor <[email protected]> * Explicitly explain quant method override ordering and ensure all overrides are ordered (vllm-project#17256) Signed-off-by: Harry Mellor <[email protected]> * [Security] Don't bind tcp zmq socket to all interfaces (vllm-project#17197) Signed-off-by: Russell Bryant <[email protected]> * [Chore] cleanup license indicators in light of SPDX (vllm-project#17259) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Russell Bryant <[email protected]> * [BugFix] Fix cascade attention - RuntimeError: scheduler_metadata must have shape (metadata_size) (vllm-project#17283) Signed-off-by: Lucas Wilkinson <[email protected]> * [Bugfix] Fix moe weight losing all extra attrs after `process_weights_after_loading`. (vllm-project#16854) Signed-off-by: charlifu <[email protected]> * [Model] Qwen3 Dense FP8 Compat Fixes (vllm-project#17318) Signed-off-by: simon-mo <[email protected]> * Support loading transformers models with named parameters (vllm-project#16868) Signed-off-by: Alex <[email protected]> * [Model] Add tuned triton fused_moe configs for Qwen3Moe (vllm-project#17328) Signed-off-by: mgoin <[email protected]> * [Benchmark] Add single turn MTBench to Serving Bench (vllm-project#17202) * [Optim] Compute multimodal hash only once per item (vllm-project#17314) Signed-off-by: DarkLight1337 <[email protected]> * implement Structural Tag with Guidance backend (vllm-project#17333) Signed-off-by: Michal Moskal <[email protected]> * [V1][Spec Decode] Make Eagle model arch config driven (vllm-project#17323) * [model] make llama4 compatible with pure dense layers (vllm-project#17315) Signed-off-by: Lucia Fang <[email protected]> * [Bugfix] Fix `numel()` downcast in fused_layernorm_dynamic_per_token_quant.cu (vllm-project#17316) * Ignore `'<string>'` filepath (vllm-project#17330) Signed-off-by: rzou <[email protected]> * [Bugfix] Add contiguous call inside rope kernel wrapper (vllm-project#17091) Signed-off-by: 苏政渊 <[email protected]> Co-authored-by: 苏政渊 <[email protected]> * [Misc] Add a Jinja template to support Mistral3 function calling (vllm-project#17195) Signed-off-by: chaunceyjiang <[email protected]> * [Model] support MiniMax-VL-01 model (vllm-project#16328) Signed-off-by: qingjun <[email protected]> * [Misc] Move config fields to MultiModalConfig (vllm-project#17343) Signed-off-by: DarkLight1337 <[email protected]> * [Misc]Use a platform independent interface to obtain the device attributes (vllm-project#17100) * [Fix] Documentation spacing in compilation config help text (vllm-project#17342) Signed-off-by: Zerohertz <[email protected]> * [Build][Bugfix] Restrict setuptools version to <80 (vllm-project#17320) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Model] Ignore rotary embed load for Cohere model (vllm-project#17319) * Update docs requirements (vllm-project#17379) Signed-off-by: Harry Mellor <[email protected]> * [Doc] Fix QWen3MOE info (vllm-project#17381) Signed-off-by: Jee Jee Li <[email protected]> * [Bugfix] Clean up MiniMax-VL and fix processing (vllm-project#17354) Signed-off-by: DarkLight1337 <[email protected]> * `pre-commit autoupdate` (vllm-project#17380) Signed-off-by: Harry Mellor <[email protected]> * [Frontend] Support `chat_template_kwargs` in `LLM.chat` (vllm-project#17356) Signed-off-by: DarkLight1337 <[email protected]> * Transformers backend tweaks (vllm-project#17365) Signed-off-by: Harry Mellor <[email protected]> * Fix: Spelling of inference (vllm-project#17387) * Improve literal dataclass field conversion to argparse argument (vllm-project#17391) Signed-off-by: Harry Mellor <[email protected]> * [V1] Remove num_input_tokens from attn_metadata (vllm-project#17193) Signed-off-by: Chen Zhang <[email protected]> * [Bugfix] add qwen3 reasoning-parser fix content is None when disable … (vllm-project#17369) Signed-off-by: mofanke <[email protected]> * fix gemma3 results all zero (vllm-project#17364) Signed-off-by: mayuyuace <[email protected]> * [Misc][ROCm] Exclude `cutlass_mla_decode` for ROCm build (vllm-project#17289) Signed-off-by: Tianyuan Wu <[email protected]> * Enabling multi-group kernel tests. (vllm-project#17115) Signed-off-by: Alexei V. Ivanov <[email protected]> * [Docs] Propose a deprecation policy for the project (vllm-project#17063) Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Doc][Typo] Fixing label in new model requests link in overview.md (vllm-project#17400) * [TPU][V1][CI] Replace `python3 setup.py develop` with standard `pip install --e` on TPU (vllm-project#17374) Signed-off-by: NickLucche <[email protected]> * [CI] Uses Python 3.11 for TPU (vllm-project#17359) Signed-off-by: Aaron Pham <[email protected]> * [CI/Build] Add retry mechanism for add-apt-repository (vllm-project#17107) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix] Fix Minicpm-O-int4 GPTQ model inference (vllm-project#17397) Signed-off-by: Isotr0py <[email protected]> * Simplify (and fix) passing of guided decoding backend options (vllm-project#17008) Signed-off-by: Harry Mellor <[email protected]> * Remove Falcon3 2x7B from CI (vllm-project#17404) Signed-off-by: Harry Mellor <[email protected]> * Fix: Python package installation for opentelmetry (vllm-project#17049) Signed-off-by: Dilip Gowda Bhagavan <[email protected]> * [V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE (vllm-project#17211) Signed-off-by: Bryan Lu <[email protected]> * Remove Bamba 9B from CI (vllm-project#17407) Signed-off-by: Harry Mellor <[email protected]> * [V1][Feature] Enable Speculative Decoding with Structured Outputs (vllm-project#14702) Signed-off-by: Benjamin Chislett <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> * [release] Always git fetch all to get latest tag on TPU release (vllm-project#17322) * Truncation control for embedding models (vllm-project#14776) Signed-off-by: Gabriel Marinho <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Co-authored-by: Max de Bayser <[email protected]> * Update PyTorch to 2.7.0 (vllm-project#16859) * Improve configs - `ModelConfig` (vllm-project#17130) Signed-off-by: Harry Mellor <[email protected]> * Fix call to `logger.info_once` (vllm-project#17416) Signed-off-by: Harry Mellor <[email protected]> * Fix some speculative decode tests with tl.dot (vllm-project#17371) Signed-off-by: Huy Do <[email protected]> * Support LoRA for Mistral3 (vllm-project#17428) Signed-off-by: mgoin <[email protected]> * [Intel GPU] [CI]Fix XPU ci, setuptools >=80.0 have build issue (vllm-project#17298) Signed-off-by: Kunshang Ji <[email protected]> * [Hardware][Intel GPU] Upgrade to torch 2.7 (vllm-project#17444) Signed-off-by: Kunshang Ji <[email protected]> Co-authored-by: Qiming Zhang <[email protected]> * [Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' (vllm-project#17434) Signed-off-by: chaunceyjiang <[email protected]> * [MODEL ADDITION] Ovis2 Model Addition (vllm-project#15826) Signed-off-by: Marco <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> * Make the _apply_rotary_emb compatible with dynamo (vllm-project#17435) * [Misc] Remove deprecated files (vllm-project#17447) Signed-off-by: chaunceyjiang <[email protected]> * [V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None (vllm-project#15755) Signed-off-by: rongfu.leng <[email protected]> * [TPU][V1][CI] Update regression test baseline for v6 CI (vllm-project#17064) Signed-off-by: NickLucche <[email protected]> * [Core] Prevent side-channel attacks via cache salting (vllm-project#17045) Signed-off-by: Marko Rosenmueller <[email protected]> * [V1][Metrics] add support for kv event publishing (vllm-project#16750) Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> * [Feature] The Qwen3 reasoning parser supports guided decoding (vllm-project#17466) Signed-off-by: chaunceyjiang <[email protected]> * [Docs] Add command for running mypy tests from CI (vllm-project#17475) Signed-off-by: Russell Bryant <[email protected]> * [Fix] Support passing args to logger (vllm-project#17425) Signed-off-by: Aaron Pham <[email protected]> * [Bugfix] Fixed mistral tokenizer path when pointing to file (vllm-project#17457) Signed-off-by: Pete Savage <[email protected]> * [V1] Allow turning off pickle fallback in vllm.v1.serial_utils (vllm-project#17427) Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Docs] Update optimization.md doc (vllm-project#17482) Signed-off-by: mgoin <[email protected]> * [BugFix] Fix authorization of openai_transcription_client.py (vllm-project#17321) Signed-off-by: zh Wang <[email protected]> * [Bugfix][ROCm] Restrict ray version due to a breaking release (vllm-project#17480) Signed-off-by: Gregory Shtrasberg <[email protected]> * [doc] add install tips (vllm-project#17373) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * doc: fix bug report Github template formatting (vllm-project#17486) Signed-off-by: David Xia <[email protected]> * [v1][Spec Decode] Make sliding window compatible with eagle prefix caching (vllm-project#17398) Signed-off-by: Chen Zhang <[email protected]> * Bump Compressed Tensors version to 0.9.4 (vllm-project#17478) Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: mgoin <[email protected]> * [Misc] Rename Audios -> Audio in Qwen2audio Processing (vllm-project#17507) Signed-off-by: Alex-Brooks <[email protected]> * [CI][TPU] Skip Multimodal test (vllm-project#17488) Signed-off-by: Siyuan Liu <[email protected]> * [Bugfix][ROCm] Fix import error on ROCm (vllm-project#17495) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Bugfix] Temporarily disable gptq_bitblas on ROCm (vllm-project#17411) Signed-off-by: Yan Cangang <[email protected]> * [CI][TPU] Skip structured outputs+spec decode tests on TPU (vllm-project#17510) Signed-off-by: mgoin <[email protected]> * [CI][Bugfix] Fix failing V1 Test due to missing 'cache_salt' arg (vllm-project#17500) Signed-off-by: mgoin <[email protected]> * [CI/Build] Reorganize models tests (vllm-project#17459) Signed-off-by: DarkLight1337 <[email protected]> * FIxing the AMD test failures caused by PR#16457 (vllm-project#17511) Signed-off-by: Alexei V. Ivanov <[email protected]> * [Build] Require setuptools >= 77.0.3 for PEP 639 (vllm-project#17389) Signed-off-by: Russell Bryant <[email protected]> * [ROCm] Effort to reduce the number of environment variables in command line (vllm-project#17229) Signed-off-by: Hongxia Yang <[email protected]> * [BugFix] fix speculative decoding memory leak when speculation is disabled (vllm-project#15506) Signed-off-by: Noah Yoshida <[email protected]> * [BugFix] Fix mla cpu - missing 3 required positional arguments (vllm-project#17494) Signed-off-by: Lucas Wilkinson <[email protected]> * Avoid overwriting vllm_compile_cache.py (vllm-project#17418) Signed-off-by: Keyun Tong <[email protected]> * [Core] Enable IPv6 with vllm.utils.make_zmq_socket() (vllm-project#16506) Signed-off-by: Russell Bryant <[email protected]> * [Misc] Optimize the Qwen3_ReasoningParser extract_reasoning_content (vllm-project#17515) Signed-off-by: chaunceyjiang <[email protected]> * Improve configs - `ObservabilityConfig` (vllm-project#17453) Signed-off-by: Harry Mellor <[email protected]> * [Bugfix][Benchmarks] Allow benchmark of deepspeed-mii backend to select a model (vllm-project#17285) Signed-off-by: Teruaki Ishizaki <[email protected]> * [Frontend] Show progress bar for adding requests (vllm-project#17525) Signed-off-by: DarkLight1337 <[email protected]> * [Misc] Clean up test docstrings and names (vllm-project#17521) Signed-off-by: DarkLight1337 <[email protected]> * [FEAT] [ROCm]: Add Qwen/Qwen3-30B-A3B-FP8 fused moe config for MI300X (vllm-project#17530) Signed-off-by: tjtanaa <[email protected]> * Fix more broken speculative decode tests (vllm-project#17450) Signed-off-by: Huy Do <[email protected]> * [doc] add streamlit integration (vllm-project#17522) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [FEAT] [ROCm]: Add Qwen/Qwen3-235B-A22B-FP8 TP4 triton fused moe config (vllm-project#17535) Signed-off-by: tjtanaa <[email protected]> * [Feature][Frontend]: Deprecate --enable-reasoning (vllm-project#17452) Signed-off-by: chaunceyjiang <[email protected]> * [ROCm] remove unsupported archs from rocm triton flash-attention supported list (vllm-project#17536) Signed-off-by: Hongxia Yang <[email protected]> * [torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations (vllm-project#10867) Signed-off-by: Sage Moore <[email protected]> * [Misc] refactor example - cpu_offload_lmcache (vllm-project#17460) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> --------- Signed-off-by: youkaichao <[email protected]> Signed-off-by: Michal Adamczyk <[email protected]> Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: reidliu41 <[email protected]> Signed-off-by: vllmellm <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: gitover22 <[email protected]> Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: windsonsea <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: csy1204 <[email protected]> Signed-off-by: sydarb <[email protected]> Signed-off-by: 开哲 <[email protected]> Signed-off-by: Omer Dayan (SW-GPU) <[email protected]> Signed-off-by: Rui Qiao <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Aaruni Aggarwal <[email protected]> Signed-off-by: Eyshika Agarwal <[email protected]> Signed-off-by: eyshika <[email protected]> Signed-off-by: Yinghai Lu <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Jens Glaser <[email protected]> Signed-off-by: varun sundar rabindranath <[email protected]> Signed-off-by: Lifu Huang <[email protected]> Signed-off-by: Mengqing Cao <[email protected]> Signed-off-by: cynthieye <[email protected]> Signed-off-by: Randall Smith <[email protected]> Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Jasmond Loh <[email protected]> Signed-off-by: Shangming Cai <[email protected]> Signed-off-by: Christian Heimes <[email protected]> Signed-off-by: Bryan Lu <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: James Wu <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Yarong Mu <[email protected]> Signed-off-by: shuw <[email protected]> Signed-off-by: charlifu <[email protected]> Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]> Signed-off-by: Andy Xie <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Signed-off-by: changjun.lee <[email protected]> Signed-off-by: imkero <[email protected]> Signed-off-by: ShuaibinLi <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Jade Zheng <[email protected]> Signed-off-by: sfc-gh-zhwang <[email protected]> Signed-off-by: kaixih <[email protected]> Signed-off-by: cascade812 <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: LiuXiaoxuanPKU <[email protected]> Signed-off-by: lkm-schulz <[email protected]> Signed-off-by: Ther-LF <[email protected]> Signed-off-by: KuntaiDu <[email protected]> Signed-off-by: evian <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: NickLucche <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: Alex <[email protected]> Signed-off-by: Michal Moskal <[email protected]> Signed-off-by: Lucia Fang <[email protected]> Signed-off-by: rzou <[email protected]> Signed-off-by: 苏政渊 <[email protected]> Signed-off-by: qingjun <[email protected]> Signed-off-by: Zerohertz <[email protected]> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: mofanke <[email protected]> Signed-off-by: mayuyuace <[email protected]> Signed-off-by: Tianyuan Wu <[email protected]> Signed-off-by: Alexei V. Ivanov <[email protected]> Signed-off-by: Dilip Gowda Bhagavan <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> Signed-off-by: Gabriel Marinho <[email protected]> Signed-off-by: Huy Do <[email protected]> Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Marco <[email protected]> Signed-off-by: isotr0py <[email protected]> Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Marko Rosenmueller <[email protected]> Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Pete Savage <[email protected]> Signed-off-by: zh Wang <[email protected]> Signed-off-by: David Xia <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Siyuan Liu <[email protected]> Signed-off-by: Yan Cangang <[email protected]> Signed-off-by: Hongxia Yang <[email protected]> Signed-off-by: Noah Yoshida <[email protected]> Signed-off-by: Keyun Tong <[email protected]> Signed-off-by: Teruaki Ishizaki <[email protected]> Signed-off-by: tjtanaa <[email protected]> Signed-off-by: Sage Moore <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Co-authored-by: Michal Adamczyk <[email protected]> Co-authored-by: Reid <[email protected]> Co-authored-by: reidliu41 <[email protected]> Co-authored-by: vllmellm <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: huafeng <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Michael Yao <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Travis Johnson <[email protected]> Co-authored-by: Yong Hoon Shin <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: Sangyeon Cho <[email protected]> Co-authored-by: Chen Xia <[email protected]> Co-authored-by: Areeb Syed <[email protected]> Co-authored-by: 张宇 <[email protected]> Co-authored-by: 开哲 <[email protected]> Co-authored-by: omer-dayan <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Rui Qiao <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Shanshan Shen <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Aaruni Aggarwal <[email protected]> Co-authored-by: Atilla <[email protected]> Co-authored-by: Eyshika Agarwal <[email protected]> Co-authored-by: Yinghai Lu <[email protected]> Co-authored-by: Maximilien de Bayser <[email protected]> Co-authored-by: jglaser <[email protected]> Co-authored-by: tjtanaa <[email protected]> Co-authored-by: Zaida Zhou <[email protected]> Co-authored-by: zhouzaida <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: varun sundar rabindranath <[email protected]> Co-authored-by: Lifu Huang <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> Co-authored-by: yexin(叶鑫) <[email protected]> Co-authored-by: MagnetoWang <[email protected]> Co-authored-by: 조상연[플레이스 AI] <[email protected]> Co-authored-by: rasmith <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Lu Fang <[email protected]> Co-authored-by: Alex Brooks <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Jasmond L <[email protected]> Co-authored-by: Shangming Cai <[email protected]> Co-authored-by: Daniel Li <[email protected]> Co-authored-by: Christian Heimes <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> Co-authored-by: Bryan Lu <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Yihua Cheng <[email protected]> Co-authored-by: James Wu <[email protected]> Co-authored-by: yarongmu-google <[email protected]> Co-authored-by: Shu Wang <[email protected]> Co-authored-by: Charlie Fu <[email protected]> Co-authored-by: Zijing Liu <[email protected]> Co-authored-by: Agata Dobrzyniewicz <[email protected]> Co-authored-by: Ning Xie <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Co-authored-by: changjun.lee <[email protected]> Co-authored-by: Kero Liang <[email protected]> Co-authored-by: Happy <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Jade Zheng <[email protected]> Co-authored-by: Flex Wang <[email protected]> Co-authored-by: Kaixi Hou <[email protected]> Co-authored-by: cascade <[email protected]> Co-authored-by: Lily Liu <[email protected]> Co-authored-by: Lennart K. M. Schulz <[email protected]> Co-authored-by: TherLF <[email protected]> Co-authored-by: Kuntai Du <[email protected]> Co-authored-by: Wanrui Dai <[email protected]> Co-authored-by: evian <[email protected]> Co-authored-by: idouba <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Alex Wu <[email protected]> Co-authored-by: Ekagra Ranjan <[email protected]> Co-authored-by: Michał Moskal <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]> Co-authored-by: Richard Zou <[email protected]> Co-authored-by: Zhengyuan Su (苏政渊) <[email protected]> Co-authored-by: 苏政渊 <[email protected]> Co-authored-by: qscqesze <[email protected]> Co-authored-by: ponix-j <[email protected]> Co-authored-by: Hyogeun Oh (오효근) <[email protected]> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: a2q1p <[email protected]> Co-authored-by: mofanke <[email protected]> Co-authored-by: Qiming Zhang <[email protected]> Co-authored-by: TY-AMD <[email protected]> Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]> Co-authored-by: casinca <[email protected]> Co-authored-by: Dilip Gowda Bhagavan <[email protected]> Co-authored-by: Bryan Lu <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: Gabriel Marinho <[email protected]> Co-authored-by: Huy Do <[email protected]> Co-authored-by: Kunshang Ji <[email protected]> Co-authored-by: Marco <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: rongfu.leng <[email protected]> Co-authored-by: Marko Rosenmueller <[email protected]> Co-authored-by: Alec <[email protected]> Co-authored-by: Pete Savage <[email protected]> Co-authored-by: zh Wang <[email protected]> Co-authored-by: David Xia <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Siyuan Liu <[email protected]> Co-authored-by: NaLan ZeYu <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Co-authored-by: Noah Yoshida <[email protected]> Co-authored-by: Keyun Tong <[email protected]> Co-authored-by: Teruaki Ishizaki <[email protected]> Co-authored-by: Sage Moore <[email protected]>
…d line (vllm-project#17229) Signed-off-by: Hongxia Yang <[email protected]>
…d line (vllm-project#17229) Signed-off-by: Hongxia Yang <[email protected]>
This is to set two environment variables in the Docker file so that users can reduce the number of environment variables when running scripts.
ENV that can improve safe tensor loading, and end-to-end time
ENV that needed for multi-process on cuda-like platform
Test: