[PHI][CINN] Fix grid sample kernel for big tensor #72628

lshpku · 2025-05-08T11:06:41Z

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

分别实例化int和int64_t版本的GridSampleCudaKernel，并适当优化Kernel中的表达式

由于该Kernel的表达式属于“明显比较复杂”的级别，所以使用了分别实例化int和int64_t的方式实现int64支持

优化包括：

给IO参数增加__restrict__限定符，这会让load/store使用non-coherence的版本，理论上更快一点
将多次load/store合并为一次value更新，减少load/store次数
合并grid和output的H/W维，减少下标计算，因为这两维是完全连续的，可以视为一维

性能粗略测试：

新的int版本比老版本快25%
新的int64_t版本比新的int版本慢2%

Pcard-85711

paddle-bot · 2025-05-08T11:06:45Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wanghuancoder

LGTM

* refine forrange (#72360) * refine forrange * refine forrange * reduce support big tensor (#71970) * reduce support big tensor * [PHI] Fix gridDim limit for reduce kernel (#72507) * [API] isclose support bigtensor (#72516) * isclose support bigtensor * refine * [API] isnan isinf isfinite support bigtensor (#72517) * isnan isinf isfinite support bigtensor * refine * [PHI] Fix cum kernel for big tensor (#72562) * [PHI] Preliminary fix for elementwise broadcast int32 shape overflow (#72584) * [PHI] Align linalg.solve kernel with torch (#72608) * Update strided copy kernel (#72662) * [PHI] Fix grid sample kernel for big tensor (#72628) * [PHI] Fix argsort big tensor bug (#72712) * [PHI] Fixed argsort big tensor bug * [PHI] Fixed shape mismatch problem. * [PHI] Fix contiguous kernel for big tensor (#72705) * [PHI] Fix flatten and split kernel for big tensor (#72634) * [PHI] Fix out-of-bound issue of paddle.take_along_axis (#72757) * [PHI] fix paddle.diag with big tensor (#72638) * [API] fix paddle.cross with big tensor (#72652) * [PHI] Fix paddle.where api for big tensor (#72717) * [PHI] Fix bincount kernel for big tensor (#72706) * fix bincount kernel for big tensor * use HostAlloc to alloc memory * add cpu test case * [PHI] Fix full_like kernel for big tensor (#72831) * [API] Fix int overflow and float16 support for paddle.frac (#72815) * [PHI] Align paddle.inner with torch in matmul logic (#72843) * [PHI] Fix paddle.var & paddle.std float16 overflow (#72650) * [PHI] Fix logsumexp precision problem (#72681) * [PHI] Debug for logsumexp, bug source found * [PHI] Removed GetNumBlocks func to get correct logsumexp * [PHI] Removed redundant debug VLOG * [PHI] Elegant grid bounded solution * [Accuracy diff No.55-56、76-77] Fix accuracy diff for var&std API (#72879) * [Accuracy diff No.21] Fix accuracy diff for heaviside API (#72894) --------- Co-authored-by: Shuhao Liang <[email protected]> Co-authored-by: Qianyue He <[email protected]> Co-authored-by: Lei Ding <[email protected]> Co-authored-by: ggggxm <[email protected]> Co-authored-by: xkkkkkk23 <[email protected]> Co-authored-by: Zx <[email protected]> Co-authored-by: huangjiyi <[email protected]> Co-authored-by: ooo oo <[email protected]>

lshpku force-pushed the fix-grid-sample branch 3 times, most recently from 041ab9e to c87529b Compare May 9, 2025 02:59

[PHI] Fix grid sample kernel for big tensor

e5d44e2

lshpku force-pushed the fix-grid-sample branch from c87529b to e5d44e2 Compare May 13, 2025 07:33

wanghuancoder approved these changes May 14, 2025

View reviewed changes

lshpku merged commit 2bde0e8 into PaddlePaddle:develop May 14, 2025
46 of 47 checks passed

wanghuancoder pushed a commit to wanghuancoder/Paddle that referenced this pull request May 27, 2025

[PHI] Fix grid sample kernel for big tensor (PaddlePaddle#72628)

831663e

lshpku mentioned this pull request Jun 11, 2025

[PHI] Fix grid sample 3d kernel for big tensor #73253

Merged

This was referenced Jul 14, 2025

[PHI] fix grid_sample PFCCLab/PaddleAPITest#367

Closed

[PHI]Fix paddle.nn.functional.grid_sample to support big Tensor #74014

Closed

[PHI] Fix paddle.nn.functional.grid_sample to support big Tensor #74019

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PHI][CINN] Fix grid sample kernel for big tensor #72628

[PHI][CINN] Fix grid sample kernel for big tensor #72628

Uh oh!

lshpku commented May 8, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented May 8, 2025

Uh oh!

wanghuancoder left a comment

Uh oh!

Uh oh!

Uh oh!

[PHI][CINN] Fix grid sample kernel for big tensor #72628

[PHI][CINN] Fix grid sample kernel for big tensor #72628

Uh oh!

Conversation

lshpku commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented May 8, 2025

Uh oh!

wanghuancoder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lshpku commented May 8, 2025 •

edited

Loading