[PHI][CINN] Fix grid sample kernel for big tensor #72628
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
Bug fixes
Description
分别实例化int和int64_t版本的GridSampleCudaKernel,并适当优化Kernel中的表达式
由于该Kernel的表达式属于“明显比较复杂”的级别,所以使用了分别实例化int和int64_t的方式实现int64支持
优化包括:
__restrict__
限定符,这会让load/store使用non-coherence的版本,理论上更快一点性能粗略测试:
Pcard-85711