Skip to content

Use TensorIndexer for the view tests #4237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 12, 2025
Merged

Use TensorIndexer for the view tests #4237

merged 6 commits into from
May 12, 2025

Conversation

naoyam
Copy link
Collaborator

@naoyam naoyam commented Apr 11, 2025

Enabled TensorIndexer for the reshape tests.

I temporarily added a codegen diff result to this PR. This one is more concise as I disabled index hoisting. As far as I can see, there's no concerning change. I haven't verified everything, but I believe most of them are because TensorIndexer can detect more divisible splits, which helps generate simplified indices through more aggressive contig indexing.

Once approved, I'll remove the html file.

Context

Part of #4175.

I'm planning to enable the new indexer globally by default once we are sufficiently confident with it. I'm going to enable it for some of the C++ tests for now. Just manually checking the diff results seems to be the only way to gain some confidence.

All the tests are passing in my local branch, but just having green test results don't necessarily mean everything is properly ported to the new indexer. I'll also check perf changes with the benchmarks, but they may not give clear signals as indexing is just one piece of performance bottlenecks.

@naoyam
Copy link
Collaborator Author

naoyam commented Apr 11, 2025

!test --diff

Copy link

github-actions bot commented Apr 11, 2025

Review updated until commit 26ad85b

Description

  • Enabled TensorIndexer for view and reshape tests

  • Added setup methods to enable TensorIndexer options

  • Temporarily included codegen diff results for review


Changes walkthrough 📝

Relevant files
Enhancement
test_gpu_view.cpp
Enable TensorIndexer in view and reshape tests                     

tests/cpp/test_gpu_view.cpp

  • Modified GpuViewTest to inherit from NVFuserTest and added SetUp
    method to enable TensorIndexer
  • Modified ReshapeReduction to inherit from NVFuserFixtureParamTest and
    added SetUp method to enable TensorIndexer
  • +15/-2   

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review

    Code Duplication

    The SetUp method is duplicated in both GpuViewTest and ReshapeReduction classes. Consider creating a base class to avoid code duplication.

    class GpuViewTest : public NVFuserTest {
     protected:
      void SetUp() override {
        NVFuserTest::SetUp();
        EnableOptionsGuard::getCurOptions().set(EnableOption::IdModel, {"all"});
      }
    };
    Test Coverage

    Ensure that the new tests cover all edge cases and scenarios, especially since the new indexer is being enabled. Verify that the existing tests are still relevant and effective.

    TEST_F(GpuViewTest, FusionViewDtypeSameSizeOutput) {
    Performance Impact

    Evaluate the performance impact of enabling TensorIndexer globally. Conduct thorough benchmarking to ensure that the benefits outweigh any potential regressions.

    class GpuViewTest : public NVFuserTest {
     protected:
      void SetUp() override {
        NVFuserTest::SetUp();
        EnableOptionsGuard::getCurOptions().set(EnableOption::IdModel, {"all"});
      }
    };

    @naoyam
    Copy link
    Collaborator Author

    naoyam commented Apr 11, 2025

    !test --diff

    naoyam added 2 commits April 11, 2025 15:30
    This reverts commit 03a1b69.
    @naoyam naoyam marked this pull request as ready for review April 11, 2025 22:38
    @naoyam naoyam requested a review from jjsjann123 April 11, 2025 22:38
    Copy link
    Collaborator

    @jjsjann123 jjsjann123 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    github cannot show the diff 😢

    Not totally sure how I should interpret the diff.

    Searching by ^-
    generated cuda indexing does look simpler. I'm a bit surprised to see the first code diff like these (line 11925 in the diff):

    - if (threadIdx.x + (128 * blockIdx.x)) < 120)
    + if (threadIdx < 120)
    

    I'm not holding anything against this PR, since it's only turning it on in the test. But let's remove the temporary file first before stamping it. I don't want to accidentally add that in our history.

    @naoyam
    Copy link
    Collaborator Author

    naoyam commented Apr 14, 2025

    github cannot show the diff 😢

    Not totally sure how I should interpret the diff.

    Searching by ^- generated cuda indexing does look simpler. I'm a bit surprised to see the first code diff like these (line 11925 in the diff):

    - if (threadIdx.x + (128 * blockIdx.x)) < 120)
    + if (threadIdx < 120)
    

    I'm not holding anything against this PR, since it's only turning it on in the test. But let's remove the temporary file first before stamping it. I don't want to accidentally add that in our history.

    IIRC, that's because the loop ID parallelized by BIDx is actually just a broadcast ID and that the new indexer is able to simplify the index.

    Do you have any questions with other changes? Any concern?

    I'm planning to enable the new indexer globally by default once we are sufficiently confident with it. I'm going to enable it for some of the C++ tests for now. Just manually checking the diff results seems to be the only way to gain some confidence.

    All the tests are passing in my local branch, but just having green test results don't necessarily mean everything is properly ported to the new indexer. I'll also check perf changes with the benchmarks, but they may not give clear signals as indexing is just one piece of performance bottlenecks.

    @naoyam
    Copy link
    Collaborator Author

    naoyam commented Apr 14, 2025

    github cannot show the diff 😢

    No, it doesn't. Please download it and open it locally.

    @jjsjann123
    Copy link
    Collaborator

    IIRC, that's because the loop ID parallelized by BIDx is actually just a broadcast ID and that the new indexer is able to simplify the index.
    Do you have any questions with other changes? Any concern?

    Thanks. my earlier quick scanning does seem to see indexing code getting at least shorter. So that's a positive thing.

    I'll also check perf changes with the benchmarks, but they may not give clear signals as indexing is just one piece of performance bottlenecks.

    Are we seeing mixed performance impact? I don't think we necessarily have to answer all those questions, but is there any significant regression that's worth investigation?
    Since this PR switches the indexing mode on only on cpp tests, not benchmark. Does this mean perf impact would still be later conducted on python benchmark, when we switch to use TensorIndexer by default?

    I don't have much concern on this PR other than that question above.

    But most importantly, let's remove the diff code so I can stamp it.

    @naoyam
    Copy link
    Collaborator Author

    naoyam commented May 9, 2025

    !test

    @naoyam naoyam requested a review from jjsjann123 May 9, 2025 21:54
    Copy link
    Collaborator

    @jjsjann123 jjsjann123 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    thanks for getting me to double check.
    Looks like tmp files are cleaned up.

    🚢

    @naoyam naoyam merged commit 7c1fa59 into main May 12, 2025
    53 checks passed
    @naoyam naoyam deleted the tensorindexer_enable_view branch May 12, 2025 16:28
    samnordmann pushed a commit that referenced this pull request May 15, 2025
    Enabled TensorIndexer for the reshape tests. 
    
    I temporarily added a codegen diff result to this PR. This one is more
    concise as I disabled index hoisting. As far as I can see, there's no
    concerning change. I haven't verified everything, but I believe most of
    them are because TensorIndexer can detect more divisible splits, which
    helps generate simplified indices through more aggressive contig
    indexing.
    
    Once approved, I'll remove the html file.
    
    ### Context 
    Part of #4175. 
    
    I'm planning to enable the new indexer globally by default once we are
    sufficiently confident with it. I'm going to enable it for some of the
    C++ tests for now. Just manually checking the diff results seems to be
    the only way to gain some confidence.
    
    All the tests are passing in my local branch, but just having green test
    results don't necessarily mean everything is properly ported to the new
    indexer. I'll also check perf changes with the benchmarks, but they may
    not give clear signals as indexing is just one piece of performance
    bottlenecks.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants