opendatalab
diff --git a/‎.gitignore.swp
-12 KB b/‎.gitignore.swp
-12 KB
diff --git a/‎README.md
Lines changed: 7 additions & 1 deletion b/‎README.md
Lines changed: 7 additions & 1 deletion
diff --git a/‎README_zh-CN.md
Lines changed: 6 additions & 1 deletion b/‎README_zh-CN.md
Lines changed: 6 additions & 1 deletion
diff --git a/‎docs/en/algorithm/layout_detection.rst
Lines changed: 26 additions & 29 deletions b/‎docs/en/algorithm/layout_detection.rst
Lines changed: 26 additions & 29 deletions
diff --git a/‎docs/en/get_started/installation.rst
Lines changed: 12 additions & 7 deletions b/‎docs/en/get_started/installation.rst
Lines changed: 12 additions & 7 deletions
diff --git a/‎docs/en/get_started/quickstart.rst
Lines changed: 1 addition & 1 deletion b/‎docs/en/get_started/quickstart.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/zh_cn/_build/doctrees/algorithm/layout_detection.doctree
1.4 KB b/‎docs/zh_cn/_build/doctrees/algorithm/layout_detection.doctree
1.4 KB
diff --git a/‎docs/zh_cn/_build/doctrees/environment.pickle
1 Byte b/‎docs/zh_cn/_build/doctrees/environment.pickle
1 Byte
diff --git a/‎docs/zh_cn/_build/doctrees/get_started/installation.doctree
1.76 KB b/‎docs/zh_cn/_build/doctrees/get_started/installation.doctree
1.76 KB
diff --git a/‎docs/zh_cn/_build/doctrees/get_started/pretrained_model.doctree
-6 Bytes b/‎docs/zh_cn/_build/doctrees/get_started/pretrained_model.doctree
-6 Bytes
diff --git a/‎docs/zh_cn/_build/doctrees/get_started/quickstart.doctree
-153 Bytes b/‎docs/zh_cn/_build/doctrees/get_started/quickstart.doctree
-153 Bytes
diff --git a/‎docs/zh_cn/_build/html/_sources/algorithm/layout_detection.rst
Lines changed: 28 additions & 31 deletions b/‎docs/zh_cn/_build/html/_sources/algorithm/layout_detection.rst
Lines changed: 28 additions & 31 deletions
diff --git a/‎docs/zh_cn/_build/html/_sources/get_started/installation.rst
Lines changed: 9 additions & 3 deletions b/‎docs/zh_cn/_build/html/_sources/get_started/installation.rst
Lines changed: 9 additions & 3 deletions
diff --git a/‎docs/zh_cn/_build/html/_sources/get_started/pretrained_model.rst
Lines changed: 1 addition & 1 deletion b/‎docs/zh_cn/_build/html/_sources/get_started/pretrained_model.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/zh_cn/_build/html/_sources/get_started/quickstart.rst
Lines changed: 2 additions & 2 deletions b/‎docs/zh_cn/_build/html/_sources/get_started/quickstart.rst
Lines changed: 2 additions & 2 deletions
@@ -7,6 +7,8 @@
 
 English | [简体中文](./README_zh-CN.md)
 
+[PDF-Extract-Kit-1.0 Tutorial](https://pdf-extract-kit.readthedocs.io/en/latest/get_started/pretrained_model.html)
+
 [[Models (🤗Hugging Face)]](https://huggingface.co/opendatalab/PDF-Extract-Kit) | [[Models(<img src="./assets/readme/modelscope_logo.png" width="20px">ModelScope)]](https://www.modelscope.cn/models/OpenDataLab/PDF-Extract-Kit) 
 
 🔥🔥🔥 [MinerU: Efficient Document Content Extraction Tool Based on PDF-Extract-Kit](https://github.com/opendatalab/MinerU)
@@ -88,7 +90,9 @@ pip install -r requirements.txt
 ```
 > **Note:** If your device does not support GPU, please install the CPU version dependencies using `requirements-cpu.txt` instead of `requirements.txt`.
 
-### Refer to [Model Download](models/README.md) to download the required model weights.
+### Model Download
+
+Please refer to the [Model Weights Download Tutorial](https://pdf-extract-kit.readthedocs.io/en/latest/get_started/pretrained_model.html) to download the required model weights. Note: You can choose to download all the weights or select specific ones. For detailed instructions, please refer to the tutorial.
 
 ### Running Demos
 
@@ -120,6 +124,8 @@ python scripts/formula_recognition.py --config=configs/formula_recognition.yaml
 ```
 You can view the formula recognition results in the `outputs/layout_detection` folder.
 
+> **Note:** For more details on using the model, please refer to the[PDF-Extract-Kit-1.0 Tutorial](https://pdf-extract-kit.readthedocs.io/en/latest/get_started/pretrained_model.html).
+
 > This project focuses on using models for `high-quality` content extraction from `diverse` documents and does not involve reconstructing extracted content into new documents, such as PDF to Markdown. For such needs, please refer to our other GitHub project: [MinerU](https://github.com/opendatalab/MinerU).
 
 ## To-Do List
 
@@ -7,6 +7,8 @@
 
 [English](./README.md) | 简体中文
 
+[PDF-Extract-Kit-1.0中文教程](https://pdf-extract-kit.readthedocs.io/zh-cn/latest/get_started/pretrained_model.html)
+
 [[Models (🤗Hugging Face)]](https://huggingface.co/opendatalab/PDF-Extract-Kit) | [[Models(<img src="./assets/readme/modelscope_logo.png" width="20px">ModelScope)]](https://www.modelscope.cn/models/OpenDataLab/PDF-Extract-Kit) 
 
 🔥🔥🔥 [MinerU：基于PDF-Extract-Kit的高效文档内容提取工具](https://github.com/opendatalab/MinerU)
@@ -95,7 +97,9 @@ pip install -r requirements.txt
 ```
 > **注意：** 如果你的设备不支持 GPU，请使用 `requirements-cpu.txt` 安装 CPU 版本的依赖。
 
-### 参考[模型下载](models/README.md)下载所需模型权重
+### 模型下载
+
+参考[模型权重下载教程](https://pdf-extract-kit.readthedocs.io/zh-cn/latest/get_started/pretrained_model.html)下载所需模型权重。注：可以选择全部下载，也可以选择部分下载，具体操作参考教程。
 
 
 ### Demo运行
@@ -130,6 +134,7 @@ python scripts/formula_recognition.py --config=configs/formula_recognition.yaml
 ```
 你可以在 `outputs/layout_detection` 文件夹下查看公式识别结果。
 
+> **注意：** 更多模型使用细节请查看[PDF-Extract-Kit-1.0 中文教程](https://pdf-extract-kit.readthedocs.io/zh-cn/latest/get_started/pretrained_model.html).
 
 > 本项目专注使用模型对`多样性`文档进行`高质量`内容提取，不涉及提取后内容拼接成新文档，如PDF转Markdown。如果有此类需求，请参考我们另一个Github项目: [MinerU](https://github.com/opendatalab/MinerU)
 
 
@@ -12,44 +12,20 @@ Layout detection is a fundamental task in document content extraction, aiming to
 Model Usage
 =================
 
-The layout detection model supports layoutlmv3 and yolov10. Once the environment is set up, you can run the layout detection algorithm script by executing ```scripts/layout_detection.py```.
+The layout detection model supports ``YOLOv10``, ``DocLayout-YOLO`` and ``LayoutLMv3``. Once the environment is set up, you can run the layout detection algorithm script by executing ``scripts/layout_detection.py``.
 
-**1. layoutlmv3**
+**Run demo**
 
 .. code:: shell
 
-   $ python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml
-   
-**2. yolov10**
-
-.. code:: shell
-
-   $ python scripts/layout_detection.py --config configs/layout_detection_yolo.yaml
+   $ python scripts/layout_detection.py --config configs/layout_detection.yaml
 
 Model Configuration
 -----------------
 
-**1. layoutlmv3**
-
-.. code:: yaml
-
-    inputs: assets/demo/layout_detection
-    outputs: outputs/layout_detection
-    tasks:
-      layout_detection:
-        model: layout_detection_layoutlmv3
-        model_config:
-          model_path: path/to/layoutlmv3_model
-
-- inputs/outputs: Define the input file path and the directory for visualization output.
-- tasks: Define the task type, currently only a layout detection task is included.
-- model: Specify the specific model type, e.g., layout_detection_layoutlmv3.
-- model_config: Define the model configuration.
-- model_path: Path to the model weights.
-
-**2. yolov10**
+**1. yolov10**
 
-Compared to layoutlmv3, yolov10 has faster inference speed and supports batch mode inference.
+Compared to LayoutLMv3, YOLOv10 has faster inference speed and supports batch mode inference.
 
 .. code:: yaml
 
@@ -80,6 +56,27 @@ Compared to layoutlmv3, yolov10 has faster inference speed and supports batch mo
 - visualize: Whether to visualize the model results; visualized results will be saved in the outputs directory.
 - rect: Whether to enable rectangular inference, default is True. If set to True, images in the same batch will be scaled while maintaining aspect ratio and padded to the same size; if False, all images in the same batch will be resized to (img_size, img_size) for inference.
 
+
+**2. layoutlmv3**
+
+.. code:: yaml
+
+    inputs: assets/demo/layout_detection
+    outputs: outputs/layout_detection
+    tasks:
+      layout_detection:
+        model: layout_detection_layoutlmv3
+        model_config:
+          model_path: path/to/layoutlmv3_model
+
+- inputs/outputs: Define the input file path and the directory for visualization output.
+- tasks: Define the task type, currently only a layout detection task is included.
+- model: Specify the specific model type, e.g., layout_detection_layoutlmv3.
+- model_config: Define the model configuration.
+- model_path: Path to the model weights.
+
+
+
 Diverse Input Support
 -----------------
 
 
@@ -7,22 +7,27 @@ In this section, we will demonstrate how to install PDF-Extract-Kit.
 Best Practices
 ==============
 
-We recommend users follow our best practices to install PDF-Extract-Kit.
-It is recommended to use a Python 3.10 conda virtual environment to install PDF-Extract-Kit.
+We recommend users follow our best practices for installing PDF-Extract-Kit. It is recommended to use a Python 3.10 conda virtual environment for the installation.
 
-**Step 1.** Use conda to create a Python 3.10 virtual environment
+**Step 1.** Create a Python 3.10 virtual environment using conda.
 
 .. code-block:: console
 
     $ conda create -n pdf-extract-kit-1.0 python=3.10 -y
     $ conda activate pdf-extract-kit-1.0
 
-**Step 2.** Install the dependencies for PDF-Extract-Kit
+**Step 2.** Install the dependencies for PDF-Extract-Kit.
 
 .. code-block:: console
-
+    $ # For GPU devices
     $ pip install -r requirements.txt
+    $ # For CPU-only devices
+    $ pip install -r requirements-cpu.txt
 
 .. note::
-
-    If your device does not support GPU, please install the CPU version dependencies using ``requirements-cpu.txt`` instead of requirements.txt
+    For the convenience of user environment configuration, requirements.txt only includes the environment needed for the current best models, which currently include:
+    - Layout Detection: YOLO series (YOLOv10, DocLayout-YOLO)
+    - Formula Detection: YOLO series (YOLOv8)
+    - Formula Recognition: UniMERNet
+    - OCR: PaddleOCR
+    For other models, such as LayoutLMv3, additional environment setup is required. For details, see \ :ref:`Layout Detection Algorithms <algorithm_layout_detection>`.
@@ -9,7 +9,7 @@ Layout Detection Example
 
 Layout detection offers several models: ``LayoutLMv3``, ``YOLOv10``, and ``DocLayout-YOLO``. Compared to ``LayoutLMv3``, ``YOLOv10`` is faster. ``DocLayout-YOLO`` is based on YOLOv10 and includes diverse document pre-training and model optimization, offering both speed and high accuracy.
 
-**1. Using Layout Detection Models
+**1. Using Layout Detection Models**
 
 .. code-block:: console
 
 
@@ -7,49 +7,26 @@
 简介
 =================
 
-布局检测是文档内容提取的基础任务，目标对页面中不同类型的区域进行定位：如图像、表格、文本、标题等，方便后续高质量内容提取。对于文本、标题等区域，可以基于OCR模型进行文字识别，对于表格区域可以基于表格识别模型进行转换。
+``布局检测`` 是文档内容提取的基础任务，目标对页面中不同类型的区域进行定位：如 ``图像``、 ``表格``、 ``文本``、 ``标题``等，方便后续高质量内容提取。对于 ``文本``、 ``标题``等区域，可以基于 ``OCR模型``进行文字识别，对于表格区域可以基于表格识别模型进行转换。
 
 模型使用
 =================
 
-布局检测模型支持layoutlmv3以及yolov10，在配置好环境的情况下，直接执行```scripts/layout_detection.py```即可运行布局检测算法脚本。
+布局检测模型支持 ``YOLOv10``， ``DocLayout-YOLO``和 ``LayoutLMv3``，在配置好环境的情况下，直接执行 ``scripts/layout_detection.py`` 即可运行布局检测算法脚本。
 
-**1. layoutlmv3**
-
-.. code:: shell
-
-   $ python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml
 
-**2. yolov10**
+**执行布局检测程序**
 
 .. code:: shell
 
-   $ python scripts/layout_detection.py --config configs/layout_detection_yolo.yaml
+   $ python scripts/layout_detection.py --config configs/layout_detection.yaml
 
 模型配置
 -----------------
 
-**1. layoutlmv3**
-
-.. code:: yaml
-
-    inputs: assets/demo/layout_detection
-    outputs: outputs/layout_detection
-    tasks:
-      layout_detection:
-        model: layout_detection_layoutlmv3
-        model_config:
-          model_path: path/to/layoutlmv3_model
-
-- inputs/outputs: 分别定义输入文件路径和可视化输出目录
-- tasks: 定义任务类型，当前只包含一个布局检测任务
-- model: 定义具体模型类型，例如layout_detection_layoutlmv3
-- model_config: 定义模型配置
-- model_path: 模型权重路径
-
-**2. yolov10**
+**1. YOLOv10**
 
-和layoutlmv3相比，yolov10推理速度更快，支持batch模式推理
+和LayoutLMv3相比，YOLOv10推理速度更快，支持batch模式推理
 
 .. code:: yaml
 
@@ -70,7 +47,7 @@
 
 - inputs/outputs: 分别定义输入文件路径和可视化输出目录
 - tasks: 定义任务类型，当前只包含一个布局检测任务
-- model: 定义具体模型类型，例如layout_detection_yolo
+- model: 定义具体模型类型，例如 ``layout_detection_yolo``
 - model_config: 定义模型配置
 - img_size: 定义图像长边大小，短边会根据长边等比例缩放，默认长边保持1280
 - conf_thres: 定义置信度阈值，仅检测大于该阈值的目标
@@ -81,10 +58,30 @@
 - rect: 是否开启rectangular推理，默认为True。若设为True，同一batch中的图像会保持长宽比进行缩放并且padding到同一尺寸；若为False，同一batch中所有图像都resize到(img_size, img_size)尺寸进行推理
 - visualize: 是否对模型结果进行可视化，可视化结果会保存在outputs目录下
 
+
+**2. LayoutLMv3**
+
+.. code:: yaml
+
+    inputs: assets/demo/layout_detection
+    outputs: outputs/layout_detection
+    tasks:
+      layout_detection:
+        model: layout_detection_layoutlmv3
+        model_config:
+          model_path: path/to/layoutlmv3_model
+
+- inputs/outputs: 分别定义输入文件路径和可视化输出目录
+- tasks: 定义任务类型，当前只包含一个布局检测任务
+- model: 定义具体模型类型，例如layout_detection_layoutlmv3
+- model_config: 定义模型配置
+- model_path: 模型权重路径
+
+
 多样化输入支持
 -----------------
 
-PDF-Extract-Kit中的布局检测脚本支持 ``单个图像``、 ``只包含图像文件的目录``、 ``单个PDF文件``、 ``只包含PDF文件的目录``等输入形式。
+PDF-Extract-Kit中的布局检测脚本支持 ``单个图像``、 ``只包含图像文件的目录``、 ``单个PDF文件``、 ``只包含PDF文件的目录`` 等输入形式。
 
 .. note::
 
 
@@ -20,9 +20,15 @@
 **步骤 2.** 安装 PDF-Extract-Kit 的依赖项
 
 .. code-block:: console
-
+    $ # 对于GPU设备
     $ pip install -r requirements.txt
+    $ # 对于无CPU设备
+    $ pip install -r requirements-cpu.txt
 
 .. note::
-
-    如果你的设备不支持 GPU，请使用 ``requirements-cpu.txt`` 安装 CPU 版本的依赖。
+    考虑到用户环境配置的便捷性，我们在requirements.txt只包含当前最好模型需要的环境，目前包含
+    - 布局检测：YOLO系列（YOLOv10, DocLayout-YOLO）
+    - 公式检测：YOLO系列 (YOLOv8)
+    - 公式识别：UniMERNet
+    - OCR： PaddleOCR
+    对于其他模型请，如LayoutLMv3需要单独安装环境，具体见\ :ref:`布局检测算法 <algorithm_layout_detection>`
@@ -37,7 +37,7 @@ HuggingFace
 
    .. code:: console
 
-      $ # 默认为 `~/.cache/huggingface/`
+      $ # 默认为 ~/.cache/huggingface/
       $ export HF_HOME=Comming soon!
 
 .. tip::
 
@@ -9,9 +9,9 @@
 布局检测示例
 ==============
 
-布局检测提供了多种模型: ``LayoutLMv3``, ``YOLOv10``,  ``DocLayout-YOLO``， 相比与 ``LayoutLMv3``, ``YOLOv10``速度更快， ``DocLayout-YOLO``则是基于 ``YOLOv10`` 的基础上进行多样性文档预训练及模型优化，速度快，精度高。
+布局检测提供了多种模型: ``LayoutLMv3``、 ``YOLOv10``、  ``DocLayout-YOLO``， 相比与 ``LayoutLMv3``， ``YOLOv10`` 速度更快， ``DocLayout-YOLO`` 则是基于 ``YOLOv10`` 的基础上进行多样性文档预训练及模型优化，速度快，精度高。
 
-**1. 使用布局检测模型
+**1. 使用布局检测模型**
 
 .. code-block:: console