Skip to content

Releases: intel/xFasterTransformer

IntrinsicGemm

26 Sep 15:07
Compare
Choose a tag to compare

xDNN v1.5.6

  • fix issue of BF16 pack issue when transpose=true

xDNN v1.5.5

  • bf16fp8bf16 gemv

xDNN v1.5.4

  • Optimize BF16 pack performance
  • Add bf16bf16bf16 gemm

xDNN v1.5.3

  • Add beta parameter to AMX gemm

xDNN v1.5.2

  • Add AMX_FP16 support for small gemm.
  • Built with GCC 13.2.1

xDNN v1.5.1

  • Add xdnn_hgemm_f32f16f32_packb_block.

xDNN v1.5.0

  • Add hgemm w/ fp32 bias.

xDNN v1.4.6

  • Add alpha and beta param to small_sgemm_f32f16bf16.

xDNN v1.4.5

  • Add post op gelu activation.

xDNN v1.4.4

  • Fix AMX illegal instruction issue.

xDNN v1.4.3

  • Add small_sgemm_f32bf16bf16.
  • Add small_sgemm_f32f16bf16.

xDNN v1.4.2

  • Support amx_gemm_bf16bf16bf16 kernel w/ any shapes.

xDNN v1.4.1

  • Add sgemm_bf16bf16f32 and sgemm_f32bf16bf16 kernels.
  • Add softmax kernels.

xDNN v1.4.0

  • Add hgemm_f32u4f32 kernels.
  • Add sgemm_f32nf4f32 kernels.

xDNN v1.3.1

  • Fix sgemm_f32u4f32 kernels parallel bug.

xDNN v1.3.0

  • Add sgemm_f32u4f32 kernels

xDNN v1.2.1

  • Add xdnn_small_amx_sgemm_bf16bf16bf16_packb implemention with transposed weight.

xDNN v1.2

  • Add xdnn_small_amx_sgemm_bf16bf16bf16_packb implemention.
  • Add xdnn_small_amx_sgemm_bf16bf16bf16_compute implemention.

xDNN v1.1

  • Add bgemm_f32bf16f32_packb weight format BA16a64b2a.
  • Add intrinsic extension api.

xDNN v1.0

  • Add sgemm kernels
  • Add sgemm_f32f16f32 kernels
  • Add sgemm_f32i8f32 kernels
  • Add hgemm_f32f16f32 kernels
  • Add hgemm_f16f16f32 kernels
  • Add hgemm_f32i8f32 kernels
  • Add bgemm_f32bf16f32 kernels