[Bugfix] PTPC gemm invocation and general gemm #612

tjtanaavllm · 2025-07-29T16:31:12Z

Per-Token-Activation Per-Channel-weight (PTPC) quantized Model:

(With AITER Linear)
vllm (pretrained=RedHatAI/Qwen3-235B-A22B-FP8-dynamic,tensor_parallel_size=8,enable_expert_parallel=True,add_bos_token=True,max_model_len=10000,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 8, batch_size: 500
|  Tasks  |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|---------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot|      3|flexible-extract|     8|exact_match|↑  |0.9022|±  |0.0082|
|         |       |strict-match    |     8|exact_match|↑  |0.8324|±  |0.0103|

(Without AITER Linear) fix the gemm dispatcher
vllm (pretrained=EmbeddedLLM/Qwen2.5-32B-Instruct-FP8-Dynamic,tensor_parallel_size=8,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 8, batch_size: 128
|  Tasks  |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|---------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot|      3|flexible-extract|     8|exact_match|↑  |0.7566|±  |0.0118|
|         |       |strict-match    |     8|exact_match|↑  |0.8135|±  |0.0107|

(With AITER Linear)
vllm (pretrained=EmbeddedLLM/Qwen2.5-32B-Instruct-FP8-Dynamic,tensor_parallel_size=8,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 8, batch_size: 128
|  Tasks  |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|---------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot|      3|flexible-extract|     8|exact_match|↑  |0.7672|±  |0.0116|
|         |       |strict-match    |     8|exact_match|↑  |0.8324|±  |0.0103|

…near Signed-off-by: tjtanaavllm <[email protected]>

github-actions · 2025-07-29T16:31:45Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

bug fix ptpc invocation of aiter kernel and edge case of non aiter li…

157ff77

…near Signed-off-by: tjtanaavllm <[email protected]>

tjtanaavllm requested review from charlifu, mawong-amd, shajrawi, gshtras, maleksan85, sunway513 and hongxiayang as code owners July 29, 2025 16:31

tjtanaavllm removed request for sunway513, charlifu, shajrawi, gshtras, mawong-amd and maleksan85 July 29, 2025 16:31

tjtanaavllm requested a review from wuhuikx July 29, 2025 16:32

tjtanaavllm changed the title ~~[Bugfix] PTPC and gemm dispatcher~~ [Bugfix] PTPC gemm invocation and general gemm Jul 30, 2025

wuhuikx approved these changes Jul 30, 2025

View reviewed changes

tjtanaavllm merged commit 4701b62 into llama_fp8_03122025 Jul 30, 2025
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] PTPC gemm invocation and general gemm #612

[Bugfix] PTPC gemm invocation and general gemm #612

Uh oh!

tjtanaavllm commented Jul 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

Uh oh!

Uh oh!

[Bugfix] PTPC gemm invocation and general gemm #612

[Bugfix] PTPC gemm invocation and general gemm #612

Uh oh!

Conversation

tjtanaavllm commented Jul 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

Uh oh!

Uh oh!

tjtanaavllm commented Jul 29, 2025 •

edited by github-actions bot

Loading