Enable multi-image support benchmarking for serving #21145

leopck · 2025-07-17T21:57:51Z

Purpose

This PR updates the online benchmark script (benchmarks/benchmark_serving.py) to support multi-image prompts. This capability is necessary for accurately benchmarking multimodal models using dataset that support more than one image e.g. MUIRBENCH.

Test Plan

To validate the changes, run the online benchmark script pointing to a multi-image dataset. The following command was used for testing with the MUIRBENCH dataset:

python3 benchmarks/benchmark_serving.py \
    --backend openai-chat \
    --endpoint /chat/completions \
    --base-url http://127.0.0.1:8688/v1 \
    --model gemma/gemma-3-27b-it \
    --request-rate "inf" \
    --percentile-metrics ttft,tpot,itl,e2el \
    --ignore-eos \
    --num-prompt 128 \
    --port 8688 \
    --dataset-path MUIRBENCH/MUIRBENCH \
    --dataset-name hf \
    --max-concurrency 128 \
    --save-result

Test Result

The results from the test command are provided below:

============ Serving Benchmark Result ============
Successful requests:                     128
Benchmark duration (s):                  116.25
Total input tokens:                      2808
Total generated tokens:                  16384
Request throughput (req/s):              1.10
Output token throughput (tok/s):         140.93
Total Token throughput (tok/s):          165.09
---------------Time to First Token----------------
Mean TTFT (ms):                          73745.15
Median TTFT (ms):                        71047.68
P99 TTFT (ms):                           111490.03
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          194.42
Median TPOT (ms):                        162.71
P99 TPOT (ms):                           425.83
---------------Inter-token Latency----------------
Mean ITL (ms):                           192.97
Median ITL (ms):                         40.98
P99 ITL (ms):                            1227.04
----------------End-to-end Latency----------------
Mean E2EL (ms):                          98436.22
Median E2EL (ms):                        88044.50
P99 E2EL (ms):                           116240.25
==================================================

Allows the online benchmark script to correctly process and send requests containing multiple images per prompt, which is required for datasets like MUIRBENCH. Example on how to run this: ```sh python3 benchmarks/benchmark_serving.py \ --backend openai-chat \ --endpoint /chat/completions \ --base-url http://127.0.0.1:8688/v1 \ --model gemma/gemma-3-27b-it \ --request-rate "inf" \ --percentile-metrics ttft,tpot,itl,e2el \ --ignore-eos \ --num-prompt 128 \ --port 8688 \ --dataset-path MUIRBENCH/MUIRBENCH \ --dataset-name hf \ --max-concurrency 128 \ --save-result ``` And here is the results: ```sh ============ Serving Benchmark Result ============ Successful requests: 128 Benchmark duration (s): 116.25 Total input tokens: 2808 Total generated tokens: 16384 Request throughput (req/s): 1.10 Output token throughput (tok/s): 140.93 Total Token throughput (tok/s): 165.09 ---------------Time to First Token---------------- Mean TTFT (ms): 73745.15 Median TTFT (ms): 71047.68 P99 TTFT (ms): 111490.03 -----Time per Output Token (excl. 1st token)------ Mean TPOT (ms): 194.42 Median TPOT (ms): 162.71 P99 TPOT (ms): 425.83 ---------------Inter-token Latency---------------- Mean ITL (ms): 192.97 Median ITL (ms): 40.98 P99 ITL (ms): 1227.04 ----------------End-to-end Latency---------------- Mean E2EL (ms): 98436.22 Median E2EL (ms): 88044.50 P99 E2EL (ms): 116240.25 ================================================== ``` Signed-off-by: Stanley Phoong <[email protected]>

github-actions · 2025-07-17T21:57:58Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces multi-image support for serving benchmarks. It adds the MuirBenchDataset and updates the request handling logic for the OpenAI Chat backend. There are two potential issues that should be addressed to ensure the code functions correctly.

benchmarks/backend_request_func.py

benchmarks/benchmark_dataset.py

Signed-off-by: Stanley Phoong <[email protected]>

xuechendi · 2025-07-17T22:36:47Z

@DarkLight1337 , may you help to review this PR?

Signed-off-by: Stanley Phoong <[email protected]>

leopck · 2025-07-18T17:06:50Z

@mgoin and @ywang96 may you help review and approve this PR? appreciate your review on this PR!

mergify bot added the performance Performance-related issues label Jul 17, 2025

leopck mentioned this pull request Jul 17, 2025

Enable multi-image support benchmarking for online mode HabanaAI/vllm-fork#1615

Open

gemini-code-assist bot reviewed Jul 17, 2025

View reviewed changes

benchmarks/backend_request_func.py Outdated Show resolved Hide resolved

benchmarks/benchmark_dataset.py Show resolved Hide resolved

Flattening to align the structure

114b6d7

Signed-off-by: Stanley Phoong <[email protected]>

Else case to catch if not list or dict

ede98c8

Signed-off-by: Stanley Phoong <[email protected]>

DarkLight1337 requested review from mgoin and ywang96 July 18, 2025 03:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enable multi-image support benchmarking for serving #21145

Enable multi-image support benchmarking for serving #21145

leopck commented Jul 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

xuechendi commented Jul 17, 2025

Uh oh!

leopck commented Jul 18, 2025

Uh oh!

Uh oh!

Uh oh!

Enable multi-image support benchmarking for serving #21145

Are you sure you want to change the base?

Enable multi-image support benchmarking for serving #21145

Conversation

leopck commented Jul 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

xuechendi commented Jul 17, 2025

Uh oh!

leopck commented Jul 18, 2025

Uh oh!

Uh oh!

leopck commented Jul 17, 2025 •

edited by github-actions bot

Loading