We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent aaf5dc5 commit df57cb2Copy full SHA for df57cb2
CHANGELOG.md
@@ -1,4 +1,16 @@
1
# CHANGELOG
2
+# [Version v1.8.1](https://github.com/intel/xFasterTransformer/releases/tag/v1.8.1)
3
+v1.8.1
4
+
5
+## Functionality
6
+- Expose the interface of embedding lookup.
7
8
+## Performance
9
+- Optimized the performance of grouped query attention (GQA).
10
+- Enhanced the performance of creating keys for the oneDNN primitive cache.
11
+- Set the [bs][nh][seq][hs] layout as the default for KV Cache, resulting in better performance.
12
+- Improved the task split imbalance issue in self-attention.
13
14
# [Version v1.8.0](https://github.com/intel/xFasterTransformer/releases/tag/v1.8.0)
15
v1.8.0 Continuous Batching on Single ARC GPU and AMX_FP16 Support.
16
VERSION
@@ -1 +1 @@
-1.8.0
+1.8.1
0 commit comments