Skip to content

[0.9.1][Dist][Bugfix] Fix mc2 process group to resolve self.cpu_group is None #1831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 21, 2025

Conversation

MengqingCao
Copy link
Collaborator

@MengqingCao MengqingCao commented Jul 16, 2025

What this PR does / why we need it?

This pr fixes the bug, which throw an error self.cpu_group is None. This is mainly caused by the wrong group ranks of process groups maintained in vllm-ascend. We need to take external dp size into account to ensure it work fine with external_launch mode.

Related fixes: #1396 #1154

@MengqingCao MengqingCao marked this pull request as ready for review July 17, 2025 02:03
@weijinqian0
Copy link
Contributor

Offline scenario verification passed!tks!

@wangxiyuan wangxiyuan changed the title [Dist][Bugfix] Fix mc2 process group [0.9.1][Dist][Bugfix] Fix mc2 process group Jul 18, 2025
Copy link
Collaborator

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ganyi1996ppo Please also take a look

@Yikun Yikun changed the title [0.9.1][Dist][Bugfix] Fix mc2 process group [0.9.1][Dist][Bugfix] Fix mc2 process group to resolve self.cpu_group is None Jul 20, 2025
@ganyi1996ppo
Copy link
Collaborator

What this PR does / why we need it?

This pr fixes the bug, which throw an error self.cpu_group is None. This is mainly caused by the wrong group ranks of process groups maintained in vllm-ascend. We need to take external dp size into account to ensure it work fine with external_launch mode.

Related fixes: #1396 #1154

Dose this externel dp means build the process group outside the vllm?

@MengqingCao
Copy link
Collaborator Author

MengqingCao commented Jul 20, 2025 via email

@ganyi1996ppo ganyi1996ppo merged commit 3a34b11 into vllm-project:v0.9.1-dev Jul 21, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants