Skip to content

LLVM incorrectly emits a __truncsfbf2 call after a bfloat function call in unoptimized code on x86 #151692

@johnplatts

Description

@johnplatts

Here is a LLVM IR snippet that has incorrect codegen on x86: https://godbolt.org/z/4Ezb51We8

Here is the expected code that should have been generated for the above snippet on x86_64 (without the incorrect __truncsfbf2 call):

BitCastI16ToBF16Wrapper:                # @BitCastI16ToBF16Wrapper
        push    rax
        call    BitCastI16ToBF16
        lea     rdi, [rsp + 6]
        call    CreateBF16WrapperFromBF16
        mov     ax, word ptr [rsp + 6]
        pop     rcx
        ret
BitCastI16ToBF16:                       # @BitCastI16ToBF16
        pinsrw  xmm0, word ptr [rdi], 0
        ret
CreateBF16WrapperFromBF16:              # @CreateBF16WrapperFromBF16
        pextrw  eax, xmm0, 0
        mov     word ptr [rdi], ax
        ret

In addition, there is an additional bug on x86_32 that assumes that the result of the BitCastI16ToBF16 is returned as a 32-bit floating-point value in st(0) instead of as an BF16 value in the xmm0 register.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions