Skip to content

Use gather to replace gather nd when the bool index is one-dimensional #72625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

zhanghonggeng
Copy link
Contributor

@zhanghonggeng zhanghonggeng commented May 8, 2025

PR Category

Performance Optimization

PR Types

Performance

Description

case:

out = a[ val == 1]
和
t1 = (val == 1).nonzero().flatten()
out = paddle.gather( a, t1)

bool index是一维的时候使用gather替换gather nd有更好的性能
测试机器V100

python api shape, dtype 平均单次运行时间 ms 加速比
paddle [] gather_nd未向量化 [108, 64, 12288], [108]paddle.bfloat16, paddle.bool 2201.24  
paddle [] gather_nd向量化 [108, 64, 12288], [108]paddle.bfloat16, paddle.bool 513.40 4.29
paddle [] gather [108, 64, 12288], [108]paddle.bfloat16, paddle.bool 289.12 7.61
paddle.gather [108, 64, 12288], [108]paddle.bfloat16, paddle.int64 300.64 7.32
torch [] [108, 64, 12288], [108]torch.bfloat16, torch.bool 510.85 4.31



对纯False的情况进行了测试,index为单值False的情况下比torch慢,其他情况下均有提升。
case:index = False
get_item paddle_gpu: 82.50 us, torch_gpu: 34.36 us, Paddle/Torch GPU score: 2.40)
get_item paddle_cpu: 100.40 us, torch_cpu: 47.92 us, Paddle/Torch CPU score: 2.10)

纯False:index = [False,False,False,...]
get_item paddle_gpu: 80.36 us, torch_gpu: 95.80 us, Paddle/Torch GPU score: 0.84)
get_item paddle_cpu: 100.39 us, torch_cpu: 113.18 us, Paddle/Torch CPU score: 0.89)

混合:index = [False,True,False,False,Ture,....]
get_item paddle_gpu: 479.26 us, torch_gpu: 587.41 us, Paddle/Torch GPU score: 0.82)
get_item paddle_cpu: 497.92 us, torch_cpu: 598.02 us, Paddle/Torch CPU score: 0.83)
Pcard-67164

Copy link

paddle-bot bot commented May 8, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@changeyoung98 changeyoung98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants