-
Notifications
You must be signed in to change notification settings - Fork 317
[core] Support capture custom ops into aclgraph #2113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[core] Support capture custom ops into aclgraph #2113
Conversation
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
@ttanzhiqiang Please review this PR also |
@yiz-liu Please review this PR also |
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (51.85%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2113 +/- ##
==========================================
- Coverage 76.09% 76.04% -0.05%
==========================================
Files 114 115 +1
Lines 13103 13130 +27
==========================================
+ Hits 9971 9985 +14
- Misses 3132 3145 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
||
with torch.npu.graph(aclgraph): | ||
# Capture the model in aclgraph. | ||
static_output = compiled_model(static_positions, static_hidden_states) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This place is a static shape. If the shape of static_positions, static_hidden_states has changed, does meta need to go through again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
Can custom ops be specially written to handle the problem of aclgraph without adding an operator to a meta at a time |
No for now. We can try to write a macro or template to automatically generate the meta implementation, but no plan for this yet. We can consider this after we have enough number of custom ops. |
cf0022c
to
6e40e45
Compare
LGTM, Please make the CI happy, I think we should merge this in high priority |
The aclgraph path seems have unaligned accuracy compared with eager path on CI, I can't reproduce it on my local environment. |
Got, I'll give it a try |
e44c707
to
adeb6d0
Compare
f2b6012
to
21527d1
Compare
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
21527d1
to
951506e
Compare
What this PR does / why we need it?
Thanks to the PR #426 make vllm-ascend support the aclgraph inference to reduce the host overhead. However, the capability of aclgraph strongly relies on the functionality provided by
torch.compile
, which is the key feature supported in torch 2.x . Therefore, capture custom op into aclgraph is only possible when it can be recognize and captured bytorch.compile
.In this PR, we register the meta implementation of current custom ops to enable the fx graph capture. And by doing that, insert those custom ops into aclgraph become a natural thing to the ascend runtime.
Does this PR introduce any user-facing change?
No user face change.
How was this patch tested?
Tested in unittest, we will integrate the
rotary_embedding
op into a small custom model and usetorch.compile
and aclgraph to capture and replay it to verify its functionality.