You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is the expected code that should have been generated for the above snippet on x86_64 (without the incorrect __truncsfbf2 call):
BitCastI16ToBF16Wrapper: # @BitCastI16ToBF16Wrapper
push rax
call BitCastI16ToBF16
lea rdi, [rsp + 6]
call CreateBF16WrapperFromBF16
mov ax, word ptr [rsp + 6]
pop rcx
ret
BitCastI16ToBF16: # @BitCastI16ToBF16
pinsrw xmm0, word ptr [rdi], 0
ret
CreateBF16WrapperFromBF16: # @CreateBF16WrapperFromBF16
pextrw eax, xmm0, 0
mov word ptr [rdi], ax
ret
In addition, there is an additional bug on x86_32 that assumes that the result of the BitCastI16ToBF16 is returned as a 32-bit floating-point value in st(0) instead of as an BF16 value in the xmm0 register.