replace XPU with Intel GPU (#2198)

xin3he · xinhe3 · web-flow · commit ee12fa8dd2f4 · 2025-05-09T10:14:44.000+08:00
* replace XPU with Intel GPU

Signed-off-by: Xin He &lt;xinhe3@habana.ai&gt;

---------

Signed-off-by: Xin He &lt;xinhe3@habana.ai&gt;
Co-authored-by: Xin He &lt;xinhe3@habana.ai&gt;
diff --git a/README.md b/README.md
@@ -35,7 +35,7 @@ support AMD CPU, ARM CPU, and NVidia GPU through ONNX Runtime with limited testi
 Choose the necessary framework dependencies to install based on your deploy environment.
 ### Install Framework
 * [Install intel_extension_for_pytorch for CPU](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)    
-* [Install intel_extension_for_pytorch for XPU](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/)    
+* [Install intel_extension_for_pytorch for Intel GPU](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/)    
 * [Use Docker Image with torch installed for HPU](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#bare-metal-fresh-os-single-click)    
   **Note**: There is a version mapping between Intel Neural Compressor and Gaudi Software Stack, please refer to this [table](./docs/source/3x/gaudi_version_map.md) and make sure to use a matched combination.    
 * [Install torch for other platform](https://pytorch.org/get-started/locally)    
diff --git a/docs/source/3x/PyTorch.md b/docs/source/3x/PyTorch.md
@@ -267,7 +267,7 @@ Deep Learning</a></td>
 
 3. How to specify an accelerator?
 
-    > Neural Compressor provides automatic accelerator detection, including HPU, XPU, CUDA, and CPU.
+    > Neural Compressor provides automatic accelerator detection, including HPU, Intel GPU, CUDA, and CPU.
 
     > The automatically detected accelerator may not be suitable for some special cases, such as poor performance, memory limitations. In such situations, users can override the detected accelerator by setting the environment variable `INC_TARGET_DEVICE`.
 
diff --git a/docs/source/faq.md b/docs/source/faq.md
@@ -33,12 +33,12 @@ torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed
 Try increasing `percdamp` (percent of the average Hessian diagonal to use for dampening), 
 or increasing `nsamples` (the number of calibration samples).
 #### Issue 7:  
-If you run GPTQ quantization with transformers-like API on xpu device, then you may encounter the following error:  
+If you run GPTQ quantization with transformers-like API on Intel GPU device, then you may encounter the following error:  
 ```shell
 [ERROR][modeling_auto.py:128] index 133 is out of bounds for dimension 0 with size 128
 [ERROR][modeling_auto.py:129] Saved low bit model loading failed, please check your model.
 HINT:
-XPU device does not support `g_idx` for GPTQ quantization now. Please stay tuned.
+Intel GPU device does not support `g_idx` for GPTQ quantization now. Please stay tuned.
 You can set desc_act=False.
 ```
 #### Issue 8:
diff --git a/docs/source/quantization.md b/docs/source/quantization.md
@@ -459,7 +459,7 @@ Intel(R) Neural Compressor support multi-framework: PyTorch, Tensorflow and ONNX
             <td align="left">IPEX</td>
             <td align="left">OneDNN</td>
             <td align="left">"ipex"</td>
-            <td align="left">cpu | xpu</td>
+            <td align="left">cpu | Intel GPU</td>
         </tr>
         <tr>
             <td rowspan="5" align="left">ONNX Runtime</td>
@@ -524,7 +524,7 @@ conf = PostTrainingQuantConfig()
 ```python
 # run with IPEX on CPU
 conf = PostTrainingQuantConfig(backend="ipex")
-# run with IPEX on XPU
+# run with IPEX on Intel GPU
 conf = PostTrainingQuantConfig(backend="ipex", device="xpu")
 ```
 ```python
@@ -543,4 +543,4 @@ conf = PostTrainingQuantConfig(backend="itex", device="gpu")
 ## Examples
 
 User could refer to [examples](https://github.com/intel/neural-compressor/blob/master/examples/README.md) on how to quantize a new model.
-If user wants to quantize an onnx model with npu, please refer to this [example](../../examples/onnxrt/image_recognition/onnx_model_zoo/shufflenet/quantization/ptq_static/README.md). If user wants to quantize a pytorch model with xpu, please refer to this [example](../../examples/pytorch/nlp/huggingface_models/question-answering/quantization/ptq_static/ipex/README.md).
+If user wants to quantize an onnx model with npu, please refer to this [example](../../examples/onnxrt/image_recognition/onnx_model_zoo/shufflenet/quantization/ptq_static/README.md). If user wants to quantize a pytorch model with Intel GPU, please refer to this [example](../../examples/pytorch/nlp/huggingface_models/question-answering/quantization/ptq_static/ipex/README.md).
diff --git a/examples/3.x_api/pytorch/image_recognition/torchvision_models/quantization/static_quant/ipex/README.md b/examples/3.x_api/pytorch/image_recognition/torchvision_models/quantization/static_quant/ipex/README.md
@@ -25,7 +25,7 @@ Please refer to [intel/intel-extension-for-pytorch(github.com)](https://github.c
    python -m pip install intel_extension_for_pytorch -f https://software.intel.com/ipex-whl-stable
    ```
 
-### Install IPEX XPU
+### Install IPEX Intel GPU
 Please build an IPEX docker container according to the [official guide](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.1.30%2bxpu&os=linux%2fwsl2&package=docker).
 
 You can run a simple sanity test to double confirm if the correct version is installed, and if the software stack can get correct hardware information onboard your system. The command should return PyTorch and IPEX versions installed, as well as GPU card(s) information detected.
@@ -84,7 +84,7 @@ bash run_quant.sh --input_model=resnext101_32x16d_wsl --dataset_location=/path/t
 bash run_benchmark.sh --input_model=resnext101_32x16d_wsl --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
 ```
 
-# Run with XPU
+# Run with Intel GPU
 
 > Note: All torchvision model names can be passed as long as they are included in `torchvision.models`, below are some examples.
 
diff --git a/examples/3.x_api/pytorch/nlp/huggingface_models/question-answering/quantization/static_quant/ipex/README.md b/examples/3.x_api/pytorch/nlp/huggingface_models/question-answering/quantization/static_quant/ipex/README.md
@@ -29,7 +29,7 @@ python run_qa.py \
     --output_dir ./savedresult
 ```
 
-## 2. Quantization with XPU
+## 2. Quantization with Intel GPU
 ### 2.1 Environment Setting
 Please build an IPEX docker container according to the [official guide](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.1.30%2bxpu&os=linux%2fwsl2&package=docker).
 
diff --git a/examples/README.md b/examples/README.md
@@ -496,7 +496,7 @@ Intel® Neural Compressor validated examples with multiple compression technique
     <td>bert-large-uncased-whole-word-masking-finetuned-squad</td>
     <td>Natural Language Processing</td>
     <td>Post-Training Static Quantization</td>
-    <td><a href="./pytorch/nlp/huggingface_models/question-answering/quantization/ptq_static/fx">fx</a> / <a href="./pytorch/nlp/huggingface_models/question-answering/quantization/ptq_static/ipex">ipex(xpu)</a></td>
+    <td><a href="./pytorch/nlp/huggingface_models/question-answering/quantization/ptq_static/fx">fx</a> / <a href="./pytorch/nlp/huggingface_models/question-answering/quantization/ptq_static/ipex">ipex(Intel GPU)</a></td>
   </tr>
   <tr>
     <td>distilbert-base-uncased-distilled-squad</td>