Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Add DynamicQuantConfig and QuantAwareTrainingConfig #1505

Merged
merged 4 commits into from
Apr 25, 2024
Merged

Conversation

changwangss
Copy link
Contributor

@changwangss changwangss commented Apr 23, 2024

Type of Change

support dynamic quantization with DynamicQuantConfig and make usage to transformers-api like with INC 2.x API.
support qat quantization with QuantAwareConfig and make usage to transformers-api like with INC 2.x API.

Description

detail description
JIRA ticket: xxx
dynamic quantization

from intel_extension_for_transformers.transformers import AutoModelForCausalLM, DynamicQuantConfig
from transformers import AutoTokenizer
model_name_or_path = "hf-internal-testing/tiny-random-gptj"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
# Dynamic quant
dq_config = DynamicQuantConfig()
q_model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                            quantization_config=dq_config,
                                        )
q_model.eval()
output = q_model(dummy_input)
# save
q_model.save_pretrained("./saved_results")
# loading
q_model = AutoModelForCausalLM.from_pretrained("./saved_results")

Quant aware training

from intel_extension_for_transformers.transformers import AutoModelForCausalLM, QuantAwareTrainingConfig
from transformers import AutoTokenizer
model_name_or_path = "hf-internal-testing/tiny-random-gptj"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
# Quant aware training
qat_config = QuantAwareTrainingConfig(
                            tokenizer=tokenizer,  # either two of one, tokenizer or train_func
                            )
q_model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                            quantization_config=qat_config,
                                        )
q_model.eval()
output = q_model(dummy_input)
# save
q_model.save_pretrained("./saved_results")
# loading
model_name_or_path = "./saved_results"
q_model = AutoModelForCausalLM.from_pretrained(model_name_or_path
)

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Copy link

github-actions bot commented Apr 23, 2024

⚡ Required checks status: All passing 🟢

Groups summary

🟢 Format Scan Tests workflow
Check ID Status Error details
format-scan (pylint) success
format-scan (bandit) success
format-scan (cloc) success
format-scan (cpplint) success

These checks are required after the changes to intel_extension_for_transformers/transformers/__init__.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/__init__.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Optimize Unit Test workflow
Check ID Status Error details
optimize-unit-test-baseline success
optimize-unit-test-PR-test success
Genreate-OptimizeUT-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/__init__.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/__init__.py, intel_extension_for_transformers/transformers/utils/config.py, tests/CI/test_quantization.py.

🟢 NeuralChat Unit Test
Check ID Status Error details
neuralchat-unit-test-baseline success
neuralchat-unit-test-PR-test success
Generate-NeuralChat-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/__init__.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/__init__.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Engine Unit Test workflow
Check ID Status Error details
engine-unit-test-baseline success
engine-unit-test-PR-test success
Genreate-Engine-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/__init__.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/__init__.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Chat Bot Test workflow
Check ID Status Error details
call-inference-llama-2-7b-chat-hf / inference test success
call-inference-mpt-7b-chat / inference test success

These checks are required after the changes to intel_extension_for_transformers/transformers/__init__.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/__init__.py, intel_extension_for_transformers/transformers/utils/config.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

@changwangss changwangss changed the title Add DynamicQuantConfig Add DynamicQuantConfig and QuantAwareTrainingConfig Apr 23, 2024
@changwangss
Copy link
Contributor Author

CI failed due to torch 2.3 upgrade, the fix depends on PR: #1508

@VincyZhang VincyZhang merged commit 6a15b48 into main Apr 25, 2024
18 checks passed
@VincyZhang VincyZhang deleted the wangchang/dynamic branch April 25, 2024 06:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants