Skip to content

[PHI][CINN] Fix grid sample kernel for big tensor #72628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

lshpku
Copy link
Contributor

@lshpku lshpku commented May 8, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

分别实例化int和int64_t版本的GridSampleCudaKernel,并适当优化Kernel中的表达式

由于该Kernel的表达式属于“明显比较复杂”的级别,所以使用了分别实例化int和int64_t的方式实现int64支持

优化包括:

  1. 给IO参数增加__restrict__限定符,这会让load/store使用non-coherence的版本,理论上更快一点
  2. 将多次load/store合并为一次value更新,减少load/store次数
  3. 合并grid和output的H/W维,减少下标计算,因为这两维是完全连续的,可以视为一维

性能粗略测试:

  1. 新的int版本比老版本快25%
  2. 新的int64_t版本比新的int版本慢2%

Pcard-85711

Copy link

paddle-bot bot commented May 8, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@lshpku lshpku force-pushed the fix-grid-sample branch 2 times, most recently from 5cc631d to 041ab9e Compare May 9, 2025 02:47
@lshpku lshpku force-pushed the fix-grid-sample branch from 041ab9e to c87529b Compare May 9, 2025 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant