Skip to content

align_grad_clip #74080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 24, 2025
Merged

align_grad_clip #74080

merged 5 commits into from
Jul 24, 2025

Conversation

zty-king
Copy link
Contributor

PR Category

Auto Parallel

PR Types

Others

Description

  • 当前pp不属于本rank的model_chunk不会进行计算(前向和反向),相应参数的梯度为None,在使用ClipGradByGlobalNorm计算全局梯度裁剪系数,每个pp_stage只计算属于本stage上的model_chunk对应的参数梯度的_squared_l2_norm,并求和(即对所有梯度的每个元素进行平方,再求和),而此时计算的这个global_norm_var,其实是一个局部和,用此进行梯度裁剪,逻辑上有问题,因此需要进行对其修复。
  • 修复方法默认与动手对齐,即每个pp_stage算出局部和之后,将其all gather到每个stage上,再将局部和相加,即得到一个真正的global_norm_var。这样做的好处是:更适合pp,通信量更小,效率更高

Copy link

paddle-bot bot commented Jul 16, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Jul 16, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2025

Codecov Report

Attention: Patch coverage is 33.33333% with 14 lines in your changes missing coverage. Please review.

Please upload report for BASE (develop@159fd7c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
python/paddle/nn/clip.py 33.33% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #74080   +/-   ##
==========================================
  Coverage           ?   33.33%           
==========================================
  Files              ?        1           
  Lines              ?       21           
  Branches           ?        0           
==========================================
  Hits               ?        7           
  Misses             ?       14           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zty-king
Copy link
Contributor Author

/re-run all-failed

@zty-king
Copy link
Contributor Author

补充本地coverage测试如下:
image
image

@xuxinyi389
Copy link
Contributor

LGTM

@pkuzyc pkuzyc merged commit a6ea965 into PaddlePaddle:develop Jul 24, 2025
76 of 79 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants