Skip to content

O3's late condition deduction and missing DCE prevent dead br_if elimination #7454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xuruiyang2002 opened this issue Apr 5, 2025 · 5 comments

Comments

@xuruiyang2002
Copy link
Contributor

xuruiyang2002 commented Apr 5, 2025

Given the following code:

(module
  (import "External" "external_function" (func $external_function))
  (func $_start (result i32)
    (local i32 i32 i32 i32 i32 i32 i32 i32 i32)
    call $foo
    local.set 1
    i32.const 1
    local.set 2
    local.get 1
    local.get 2
    i32.and
    local.set 3
    i32.const -556321873
    local.set 4
    local.get 3
    local.get 4
    i32.ge_u
    local.set 5
    i32.const 0
    i32.load
    local.set 6
    local.get 5
    local.get 6
    i32.and
    local.set 7
    i32.const 0
    local.set 0
    block  ;; label = @1
      local.get 7
      i32.eqz
      br_if 0 (;@1;)
      call $external_function
      i32.const 0
      local.set 0
    end
    local.get 0
    local.set 8
    i32.const 0
    local.get 8
    i32.store
    unreachable)
  (func $foo (result i32)
    i32.const 0
    i32.const 0
    i32.store8
    i32.const 0)
  (memory (;0;) 258 258)
  (export "_start" (func $_start)))

wasm-opt (16dbac1) can eliminate the dead br_if body by -all -O2 but cannot by -all -O3.

Analysis

The most probable issue lies in the optimize-instructions pass:

  • In O3, optimize-instructions fails to deduce the condition to zero:
    • While in O2, it successfully deduces the condition to zero.
    • Additional DCE is performed in the updateAfterInlining of inlining-optimizing in O2.
  • In O3, the condition is eventually reduced to zero after inlining-optimizing, but no further DCE (like vacuum) is performed.
    • This makes the optimization too late compared to O2.

In short, there appears to be a missed optimization in optimize-instructions for O3.

@xuruiyang2002 xuruiyang2002 changed the title O3 fails to eliminate dead br_if dody after reducing condition to zero O3's late condition deduction and missing DCE prevent dead br_if elimination Apr 7, 2025
@kripken
Copy link
Member

kripken commented Apr 7, 2025

I'm not sure this is an issue in OptimizeInstructions, or at least I can't see what to change there to make this better?

On the other hand, adding --merge-blocks --vacuum after -O3 fixes it. Some structures just take more optimization cycles. Specifically merging blocks moves stuff out of the if condition, so it is easily optimized.

If we had a "keep optimizing til fixed point" mode then this would be optimized fully, but we run a fixed pipeline for simplicity, which works most of the time, and avoids unpredictable compilation times.

@xuruiyang2002
Copy link
Contributor Author

I understand what you mean. It’s probably a case where wasm-opt doesn't handle this well due to the ordering and interaction of passes. It's tricky to fully grasp, and as you said, some optimizations might just need more cycles to reach an "optimal state" properly.

@xuruiyang2002
Copy link
Contributor Author

My question about is the ordering and interaction:

I've see the wasm-opt use merge-blocks and vacuum many times in function scope (in addDefaultFunctionOptimizationPasses). and not used anytime in addDefaultGlobalOptimizationPostPasses. Why couldn't add the two in the global scope, since after inlining the wasm-opt still invokes the local optimization OptimizeInstructions? Is it because of other considerations?

@kripken
Copy link
Member

kripken commented Apr 9, 2025

"global" passes operate on the entire module, while "function" passes operate on each function independently. "global" ones are used for things that need to look at multiple functions at a time, like inlining.

In theory if there was a benefit to running merge-blocks between global passes, we would do it. In fact, some global passes run function passes internally, like inlining-optimizing. But in the testcase here, the situation is pretty rare that we need it after the global opts, and I'm not sure the tradeoff is worth it.

In general, we want -O3 to be good for most use cases, but it is fine if sometimes the user must run -O3 -O3 (run it twice) to get the full benefits. And we don't want to make -O3 slower and slower, because that would slow down the common case. That's what I mean by tradeoffs here.

@xuruiyang2002
Copy link
Contributor Author

. And we don't want to make -O3 slower and slower, because that would slow down the common case. That's what I mean by tradeoffs here.

Yeah, I understand what you mean, especially the tradeoffs here. I think it is of low priority and we might be better to leave this for later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants