You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 25, 2024. It is now read-only.
We provide compression technologies such as `MixedPrecision`, `SmoothQuant` and `WeightOnlyQuant` with `Rtn/Awq/Teq/GPTQ/AutoRound` algorithms and `BitsandBytes`, `load_in_4bit` and `load_in_8bit` work on CPU device, the followings are command to show how to use it.
22
-
>**Note**:
23
-
> Model type "llama" will default use [ipex.optimize_transformers](https://github.com/intel/intel-extension-for-pytorch/blob/339bd251841e153ad9c34e1033ab8b2d936a1781/docs/tutorials/llm/llm_optimize_transformers.md) to accelerate the inference, but "llama" requests transformers version lower than 4.36.0, "falcon" requests transformers version lower than 4.33.3.
@@ -166,17 +179,14 @@ This creates an image called `evaluation-harness-multiple`, and runs a test on i
166
179
Suppose the fp32 model is `starcoder-3b`, saved quantized model in `saved_results` and do evaluation on `multiple-lua` tasks with:
167
180
```
168
181
docker run -v $(CURDIR):$(CURDIR) -it /bin/bash
169
-
python3 run_generation.py \
182
+
python3 run_generation_sq.py \
170
183
--model $(CURDIR)/starcoder-3b \
171
-
--quantize \
172
184
--sq \
173
185
--alpha 0.7 \
174
-
--ipex \
175
-
--calib_iters 500 \
186
+
--calib_n_samples 500 \
176
187
--calib_batch_size 1 \
177
188
--dataset "mbpp" \
178
189
--output_dir "$(CURDIR)/saved_results" \
179
-
--int8 \
180
190
--accuracy \
181
191
--tasks multiple-py \
182
192
--batch_size 20 \
@@ -191,9 +201,9 @@ python3 run_generation.py \
191
201
To run the container (here from image `evaluation-harness-multiple`) to quantize and evaluate on `CURDIR`, or another file mount it with -v, specify n_samples and allow code execution with --allow_code_execution (and add the number of problems --limit if it was used during generation):
0 commit comments