Add DynamicQuantConfig and QuantAwareTrainingConfig #1505

changwangss · 2024-04-23T07:23:00Z

Type of Change

support dynamic quantization with DynamicQuantConfig and make usage to transformers-api like with INC 2.x API.
support qat quantization with QuantAwareConfig and make usage to transformers-api like with INC 2.x API.

Description

detail description
JIRA ticket: xxx
dynamic quantization

from intel_extension_for_transformers.transformers import AutoModelForCausalLM, DynamicQuantConfig
from transformers import AutoTokenizer
model_name_or_path = "hf-internal-testing/tiny-random-gptj"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
# Dynamic quant
dq_config = DynamicQuantConfig()
q_model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                            quantization_config=dq_config,
                                        )
q_model.eval()
output = q_model(dummy_input)
# save
q_model.save_pretrained("./saved_results")
# loading
q_model = AutoModelForCausalLM.from_pretrained("./saved_results")

Quant aware training

from intel_extension_for_transformers.transformers import AutoModelForCausalLM, QuantAwareTrainingConfig
from transformers import AutoTokenizer
model_name_or_path = "hf-internal-testing/tiny-random-gptj"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
# Quant aware training
qat_config = QuantAwareTrainingConfig(
                            tokenizer=tokenizer,  # either two of one, tokenizer or train_func
                            )
q_model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                            quantization_config=qat_config,
                                        )
q_model.eval()
output = q_model(dummy_input)
# save
q_model.save_pretrained("./saved_results")
# loading
model_name_or_path = "./saved_results"
q_model = AutoModelForCausalLM.from_pretrained(model_name_or_path
)

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

github-actions · 2024-04-23T07:23:26Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 Format Scan Tests workflow

Check ID	Status
format-scan (pylint)	success	✅
format-scan (bandit)	success	✅
format-scan (cloc)	success	✅
format-scan (cpplint)	success	✅

These checks are required after the changes to intel_extension_for_transformers/transformers/__init__.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/__init__.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Optimize Unit Test workflow

Check ID	Status
optimize-unit-test-baseline	success	✅
optimize-unit-test-PR-test	success	✅
Genreate-OptimizeUT-Report	success	✅

These checks are required after the changes to intel_extension_for_transformers/transformers/__init__.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/__init__.py, intel_extension_for_transformers/transformers/utils/config.py, tests/CI/test_quantization.py.

🟢 NeuralChat Unit Test

Check ID	Status
neuralchat-unit-test-baseline	success	✅
neuralchat-unit-test-PR-test	success	✅
Generate-NeuralChat-Report	success	✅

These checks are required after the changes to intel_extension_for_transformers/transformers/__init__.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/__init__.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Engine Unit Test workflow

Check ID	Status
engine-unit-test-baseline	success	✅
engine-unit-test-PR-test	success	✅
Genreate-Engine-Report	success	✅

These checks are required after the changes to intel_extension_for_transformers/transformers/__init__.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/__init__.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Chat Bot Test workflow

Check ID	Status	Error details
call-inference-llama-2-7b-chat-hf / inference test	success		✅
call-inference-mpt-7b-chat / inference test	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/__init__.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/__init__.py, intel_extension_for_transformers/transformers/utils/config.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

changwangss · 2024-04-24T05:37:58Z

CI failed due to torch 2.3 upgrade, the fix depends on PR: #1508

Signed-off-by: changwangss <[email protected]>

for more information, see https://pre-commit.ci

changwangss requested a review from PenghuiCheng as a code owner April 23, 2024 07:23

changwangss changed the title ~~Add DynamicQuantConfig~~ Add DynamicQuantConfig and QuantAwareTrainingConfig Apr 23, 2024

changwangss and others added 3 commits April 24, 2024 02:00

add dynamicquantconfig

c7be2d9

Signed-off-by: changwangss <[email protected]>

add qatconfig

d70106d

Signed-off-by: changwangss <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

5eb827d

for more information, see https://pre-commit.ci

changwangss force-pushed the wangchang/dynamic branch from d1f9f3b to 5eb827d Compare April 24, 2024 09:04

[pre-commit.ci] auto fixes from pre-commit.com hooks

3351c7e

for more information, see https://pre-commit.ci

PenghuiCheng approved these changes Apr 25, 2024

View reviewed changes

VincyZhang merged commit 6a15b48 into main Apr 25, 2024
18 checks passed

VincyZhang deleted the wangchang/dynamic branch April 25, 2024 06:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DynamicQuantConfig and QuantAwareTrainingConfig #1505

Add DynamicQuantConfig and QuantAwareTrainingConfig #1505

changwangss commented Apr 23, 2024 •

edited

Loading

github-actions bot commented Apr 23, 2024 •

edited

Loading

changwangss commented Apr 24, 2024

Add DynamicQuantConfig and QuantAwareTrainingConfig #1505

Add DynamicQuantConfig and QuantAwareTrainingConfig #1505

Conversation

changwangss commented Apr 23, 2024 • edited Loading

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

github-actions bot commented Apr 23, 2024 • edited Loading

⚡ Required checks status: All passing 🟢

Groups summary

changwangss commented Apr 24, 2024

changwangss commented Apr 23, 2024 •

edited

Loading

github-actions bot commented Apr 23, 2024 •

edited

Loading