Skip to content

[core] Support capture custom ops into aclgraph #2113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

ganyi1996ppo
Copy link
Collaborator

@ganyi1996ppo ganyi1996ppo commented Jul 30, 2025

What this PR does / why we need it?

Thanks to the PR #426 make vllm-ascend support the aclgraph inference to reduce the host overhead. However, the capability of aclgraph strongly relies on the functionality provided by torch.compile, which is the key feature supported in torch 2.x . Therefore, capture custom op into aclgraph is only possible when it can be recognize and captured by torch.compile.

In this PR, we register the meta implementation of current custom ops to enable the fx graph capture. And by doing that, insert those custom ops into aclgraph become a natural thing to the ascend runtime.

Does this PR introduce any user-facing change?

No user face change.

How was this patch tested?

Tested in unittest, we will integrate the rotary_embedding op into a small custom model and use torch.compile and aclgraph to capture and replay it to verify its functionality.

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@ganyi1996ppo ganyi1996ppo requested review from Yikun and wangxiyuan July 30, 2025 10:32
@ganyi1996ppo ganyi1996ppo marked this pull request as ready for review July 30, 2025 10:32
@ganyi1996ppo
Copy link
Collaborator Author

@ttanzhiqiang Please review this PR also

@ganyi1996ppo
Copy link
Collaborator Author

@yiz-liu Please review this PR also

Copy link

codecov bot commented Jul 30, 2025

Codecov Report

❌ Patch coverage is 51.85185% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.04%. Comparing base (0bd5ff5) to head (951506e).

Files with missing lines Patch % Lines
vllm_ascend/meta_registration.py 50.00% 13 Missing ⚠️

❌ Your patch check has failed because the patch coverage (51.85%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2113      +/-   ##
==========================================
- Coverage   76.09%   76.04%   -0.05%     
==========================================
  Files         114      115       +1     
  Lines       13103    13130      +27     
==========================================
+ Hits         9971     9985      +14     
- Misses       3132     3145      +13     
Flag Coverage Δ
unittests 76.04% <51.85%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


with torch.npu.graph(aclgraph):
# Capture the model in aclgraph.
static_output = compiled_model(static_positions, static_hidden_states)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This place is a static shape. If the shape of static_positions, static_hidden_states has changed, does meta need to go through again?

Copy link
Collaborator Author

@ganyi1996ppo ganyi1996ppo Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@ttanzhiqiang
Copy link
Contributor

Can custom ops be specially written to handle the problem of aclgraph without adding an operator to a meta at a time

@ganyi1996ppo
Copy link
Collaborator Author

Can custom ops be specially written to handle the problem of aclgraph without adding an operator to a meta at a time

No for now. We can try to write a macro or template to automatically generate the meta implementation, but no plan for this yet. We can consider this after we have enough number of custom ops.

@wangxiyuan
Copy link
Collaborator

LGTM, Please make the CI happy, I think we should merge this in high priority

@ganyi1996ppo
Copy link
Collaborator Author

LGTM, Please make the CI happy, I think we should merge this in high priority

The aclgraph path seems have unaligned accuracy compared with eager path on CI, I can't reproduce it on my local environment.

@MengqingCao
Copy link
Collaborator

MengqingCao commented Aug 6, 2025

It seems the first failed case is acc test case due to the model download issue. I think you can rebase your code, as acc test will not run per pr now.
image

@ganyi1996ppo
Copy link
Collaborator Author

It seems the first failed case is acc test case due to the model download issue. I think you can rebase your code, as acc test will not run per pr now.

Got, I'll give it a try

@ganyi1996ppo ganyi1996ppo force-pushed the ganyi/meta_registration branch from e44c707 to adeb6d0 Compare August 6, 2025 07:49
@ganyi1996ppo ganyi1996ppo force-pushed the ganyi/meta_registration branch 3 times, most recently from f2b6012 to 21527d1 Compare August 8, 2025 02:24
@ganyi1996ppo ganyi1996ppo force-pushed the ganyi/meta_registration branch from 21527d1 to 951506e Compare August 8, 2025 04:36
@Yikun Yikun added accuracy-test enable all accuracy test for PR ready-for-test start test by label for PR labels Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accuracy-test enable all accuracy test for PR module:core module:tests ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants