LLVM incorrectly emits a __truncsfbf2 call after a bfloat function call in unoptimized code on x86

Here is a LLVM IR snippet that has incorrect codegen on x86: https://godbolt.org/z/4Ezb51We8

Here is the expected code that should have been generated for the above snippet on x86_64 (without the incorrect __truncsfbf2 call):
```
BitCastI16ToBF16Wrapper:                # @BitCastI16ToBF16Wrapper
        push    rax
        call    BitCastI16ToBF16
        lea     rdi, [rsp + 6]
        call    CreateBF16WrapperFromBF16
        mov     ax, word ptr [rsp + 6]
        pop     rcx
        ret
BitCastI16ToBF16:                       # @BitCastI16ToBF16
        pinsrw  xmm0, word ptr [rdi], 0
        ret
CreateBF16WrapperFromBF16:              # @CreateBF16WrapperFromBF16
        pextrw  eax, xmm0, 0
        mov     word ptr [rdi], ax
        ret
```

In addition, there is an additional bug on x86_32 that assumes that the result of the BitCastI16ToBF16 is returned as a 32-bit floating-point value in `st(0)` instead of as an BF16 value in the `xmm0` register.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLVM incorrectly emits a __truncsfbf2 call after a bfloat function call in unoptimized code on x86 #151692

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LLVM incorrectly emits a __truncsfbf2 call after a bfloat function call in unoptimized code on x86 #151692

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions