Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Commit d2f2dd1

Browse files
committed
update doc
Signed-off-by: zhenwei-intel <[email protected]>
1 parent c8b00fd commit d2f2dd1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/weightonlyquant.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ loaded_model = AutoModelForCausalLM.from_pretrained(saved_dir)
147147
> Note: For LLM runtime model loading usage, please refer to [neural_speed readme](https://github.com/intel/neural-speed/blob/main/README.md#quick-start-transformer-like-usage)
148148
149149
## Examples For Intel GPU
150-
Intel-extension-for-transformers implement weight-only quantization for intel GPU(PVC/ARC/MTL) with [Intel-extension-for-pytorch](https://github.com/intel/intel-extension-for-pytorch). Currently, the Linear op kernel of Weight-only quantization is implemented in the Intel-extension-for-pytorch branch: "dev/QLLM".
150+
Intel-extension-for-transformers implement weight-only quantization for intel GPU(PVC/ARC/MTL) with [Intel-extension-for-pytorch](https://github.com/intel/intel-extension-for-pytorch).
151151

152152
Now 4-bit/8-bit inference with `RtnConfig`, `AwqConfig`, `GPTQConfig`, `AutoRoundConfig` are support on intel GPU device.
153153

0 commit comments

Comments
 (0)