-
-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER #17153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER #17153
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
49f09d6
to
c93da8a
Compare
Signed-off-by: Akash Kaothalkar <[email protected]>
Signed-off-by: Akash Kaothalkar <[email protected]>
c93da8a
to
f89176e
Compare
Can you make a test for this or at least confirm you've manually tested it? Specifically a kernel test would be great |
Hi @mgoin,
I’ll plan to add kernel-level or architecture-specific tests as a follow-up. Logs from model tests on POWER
The output of `python collect_env.py`
|
This PR adds support for compressed tensor W8A8 INT8 quantization on POWER architecture using oneDNN.
Key changes include: