Skip to content

[Emitting Zero] [Local Tracking] General rule might not handle unary ne emitting zero bits well #7493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xuruiyang2002 opened this issue Apr 14, 2025 · 3 comments

Comments

@xuruiyang2002
Copy link
Contributor

Given the following code:

(module
  (import "External" "external_function" (func $external_function))
  (func $foo (result i32)
    i32.const 0
    i32.load)
  (func $_start (param $0 i32) (param $1 i64) (param $2 i32) 
    (local $3 i32) (local $4 i32) (local $5 i32) (local $6 i32) (local $7 i32) (local $9 i32)
    call $foo
    local.set $4
    i32.const 1
    local.set $3
    local.get $4
    local.get $3
    i32.shl
    i32.const 1
    i32.shr_s
    local.set $6
    i32.const 0
    i32.const 0
    i32.store
    i32.const -259031342
    local.set $7
    local.get $6
    local.get $7
    i32.ne
    local.set $9
    block  ;; label = @1
      local.get $9
      i32.eqz
      br_if 0 (;@1;)
      unreachable
    end
    call $bar)
  (func $bar call $external_function)
  (memory $0 258 258)
  (export "_start" (func $_start)))

wasm-opt (755a8d0) should deduce the condition to false, thus fold the branch to unreachable (further deleting call $bar), which works under -O2 but fails under -O3.

Below is optimized by -all -O3:

(func $_start (type $1) (param $0 i32) (param $1 i64) (param $2 i32)
  (local.set $0
   (i32.shr_s
    (i32.shl
     (i32.load
      (i32.const 0)
     )   
     (i32.const 1)
    )   
    (i32.const 1)
   )   
  )
  (i32.store
   (i32.const 0)
   (i32.const 0)
  )
  (if 
   (i32.ne
    (local.get $0) 
    (i32.const -259031342)
   )   
   (then
    (unreachable)
   )   
  )
  (call $external_function)
 )

As you can see, the condition here

   (i32.ne
    (local.get $0) 
    (i32.const -259031342)
   )   

is not deduced to true.

Similar to #7492, Maybe there a missing rule for it to emit zero bits, or is the local tracking insufficient?

@kripken
Copy link
Member

kripken commented Apr 16, 2025

Yes, limitations of local tracking hit us here: we don't see all the contents in the local, just part of the pattern we are looking for.

I think, however, that a good fix for this would actually be in SimplifyLocals. That will normally move the local into the use:

  (local.set $0
   (i32.shr_s
    (i32.shl
     (i32.load
      (i32.const 0)
     )
     (i32.const 1)
    )
    (i32.const 1)
   )
  )
  (if
   (i32.ne
    (local.get $0)
    (i32.const -259031342)
   )
   (then
    (unreachable)
   )
  )

=>

  (if
   (i32.ne
    (i32.shr_s
     (i32.shl
      (i32.load
       (i32.const 0)
      )
      (i32.const 1)
     )
     (i32.const 1)
    )
    (i32.const -259031342)
   )
   (then
    (unreachable)
   )
  )

(and remove the local.set). The problem here is the inner load, which has effects - that pass sees effects on the local.set's value, and gives up. We could do one of two things:

  1. Make it find the part without effects, and move only that.
  2. Add a pass that un-optimizes nested effects like that. That is,
(local.set $0
 (i32.eqz
  (i32.load)
 )
)

=>

(local.set $inner
 (i32.load)
)
(local.set $0
 (i32.eqz
  (local.get $inner)
 )
)

This is the opposite of what SimplifyLocals does, but it would solve a whole range of these issues, I think.

@xuruiyang2002
Copy link
Contributor Author

That was a clear explanation; I understand now.

The i32.load (with side-effect) in the calculation for $0 prevents the SimplifyLocals from inlining calculation into the if condition. Without such inlining, Precompute-propagation doesn't see the full picture, thus fails to do further optimizations.

Sounds the first solution is limited, second is more general. Add a new pass which could find operations with side-effect and isolate them with temporary variable, thus SimplifyLocals could do inlining as normal.

But how much do these solutions cost? Is it good enough to fix #7492?

@kripken
Copy link
Member

kripken commented Apr 18, 2025

It wouldn't fix that particular issue - the problem there wasn't an inner expression with effects like the load in the last example.

But it would fix this, and iirc a few others. This is a general situation we don't yet handle well enough. Flatten was, in theory, the solution to this, but it does a lot more:

binaryen/src/ir/flat.h

Lines 45 to 55 in e185ff9

// 1. Aside from a local.set, the operands of an instruction must be a
// local.get, a const, an unreachable, or a ref.as_non_null. Anything else
// is written to a local earlier.
// 2. Disallow control flow (block, loop, if, and try) return values, and do
// not allow the function body to have a concrete type, i.e., do not use
// control flow to pass around values.
// 3. Disallow local.tee, setting a local is always done in a local.set
// on a non-nested-expression location.
// 4. local.set cannot have an operand that is control flow (control flow with
// values is prohibited already, but e.g. a block ending in unreachable,
// which can normally be nested, is also disallowed).

We need 1 and 3 from that list, but not 2 and 4. Perhaps a variant of the Flatten pass that only does them is an option here, "minimally flatten" or "flatten expressions (but not control flow)".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants