Skip to content

[PHI][CINN] Fix grid sample kernel for big tensor #72628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 14, 2025

Conversation

lshpku
Copy link
Contributor

@lshpku lshpku commented May 8, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

分别实例化int和int64_t版本的GridSampleCudaKernel,并适当优化Kernel中的表达式

由于该Kernel的表达式属于“明显比较复杂”的级别,所以使用了分别实例化int和int64_t的方式实现int64支持

优化包括:

  1. 给IO参数增加__restrict__限定符,这会让load/store使用non-coherence的版本,理论上更快一点
  2. 将多次load/store合并为一次value更新,减少load/store次数
  3. 合并grid和output的H/W维,减少下标计算,因为这两维是完全连续的,可以视为一维

性能粗略测试:

  1. 新的int版本比老版本快25%
  2. 新的int64_t版本比新的int版本慢2%

Pcard-85711

Copy link

paddle-bot bot commented May 8, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@lshpku lshpku force-pushed the fix-grid-sample branch 3 times, most recently from 041ab9e to c87529b Compare May 9, 2025 02:59
@lshpku lshpku force-pushed the fix-grid-sample branch from c87529b to e5d44e2 Compare May 13, 2025 07:33
Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lshpku lshpku merged commit 2bde0e8 into PaddlePaddle:develop May 14, 2025
46 of 47 checks passed
wanghuancoder pushed a commit to wanghuancoder/Paddle that referenced this pull request May 27, 2025
wanghuancoder added a commit that referenced this pull request Jun 3, 2025
* refine forrange (#72360)

* refine forrange

* refine forrange

* reduce support big tensor (#71970)

* reduce support big tensor

* [PHI] Fix gridDim limit for reduce kernel (#72507)

* [API] isclose support bigtensor (#72516)

* isclose support bigtensor

* refine

* [API] isnan isinf isfinite support bigtensor (#72517)

* isnan isinf isfinite support bigtensor

* refine

* [PHI] Fix cum kernel for big tensor (#72562)

* [PHI] Preliminary fix for elementwise broadcast int32 shape overflow (#72584)

* [PHI] Align linalg.solve kernel with torch (#72608)

* Update strided copy kernel (#72662)

* [PHI] Fix grid sample kernel for big tensor (#72628)

* [PHI] Fix argsort big tensor bug (#72712)

* [PHI] Fixed argsort big tensor bug

* [PHI] Fixed shape mismatch problem.

* [PHI] Fix contiguous kernel for big tensor (#72705)

* [PHI] Fix flatten and split kernel for big tensor (#72634)

* [PHI] Fix out-of-bound issue of paddle.take_along_axis (#72757)

* [PHI] fix paddle.diag with big tensor (#72638)

* [API] fix paddle.cross with big tensor (#72652)

* [PHI] Fix paddle.where api for big tensor (#72717)

* [PHI] Fix bincount kernel for big tensor (#72706)

* fix bincount kernel for big tensor

* use HostAlloc to alloc memory

* add cpu test case

* [PHI] Fix full_like kernel for big tensor (#72831)

* [API] Fix int overflow and float16 support for paddle.frac (#72815)

* [PHI] Align paddle.inner with torch in matmul logic (#72843)

* [PHI] Fix paddle.var & paddle.std float16 overflow (#72650)

* [PHI] Fix logsumexp precision problem (#72681)

* [PHI] Debug for logsumexp, bug source found

* [PHI] Removed GetNumBlocks func to get correct logsumexp

* [PHI] Removed redundant debug VLOG

* [PHI] Elegant grid bounded solution

* [Accuracy diff No.55-56、76-77] Fix accuracy diff for var&std API (#72879)

* [Accuracy diff No.21] Fix accuracy diff for heaviside API (#72894)

---------

Co-authored-by: Shuhao Liang <[email protected]>
Co-authored-by: Qianyue He <[email protected]>
Co-authored-by: Lei Ding <[email protected]>
Co-authored-by: ggggxm <[email protected]>
Co-authored-by: xkkkkkk23 <[email protected]>
Co-authored-by: Zx <[email protected]>
Co-authored-by: huangjiyi <[email protected]>
Co-authored-by: ooo oo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants