Skip to content

Commit d92f6f6

Browse files
committed
Update tutorial docs
1 parent 96b303d commit d92f6f6

25 files changed

+189
-172
lines changed

.gitignore.swp

-12 KB
Binary file not shown.

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77

88
English | [简体中文](./README_zh-CN.md)
99

10+
[PDF-Extract-Kit-1.0 Tutorial](https://pdf-extract-kit.readthedocs.io/en/latest/get_started/pretrained_model.html)
11+
1012
[[Models (🤗Hugging Face)]](https://huggingface.co/opendatalab/PDF-Extract-Kit) | [[Models(<img src="./assets/readme/modelscope_logo.png" width="20px">ModelScope)]](https://www.modelscope.cn/models/OpenDataLab/PDF-Extract-Kit)
1113

1214
🔥🔥🔥 [MinerU: Efficient Document Content Extraction Tool Based on PDF-Extract-Kit](https://github.com/opendatalab/MinerU)
@@ -88,7 +90,9 @@ pip install -r requirements.txt
8890
```
8991
> **Note:** If your device does not support GPU, please install the CPU version dependencies using `requirements-cpu.txt` instead of `requirements.txt`.
9092
91-
### Refer to [Model Download](models/README.md) to download the required model weights.
93+
### Model Download
94+
95+
Please refer to the [Model Weights Download Tutorial](https://pdf-extract-kit.readthedocs.io/en/latest/get_started/pretrained_model.html) to download the required model weights. Note: You can choose to download all the weights or select specific ones. For detailed instructions, please refer to the tutorial.
9296

9397
### Running Demos
9498

@@ -120,6 +124,8 @@ python scripts/formula_recognition.py --config=configs/formula_recognition.yaml
120124
```
121125
You can view the formula recognition results in the `outputs/layout_detection` folder.
122126

127+
> **Note:** For more details on using the model, please refer to the[PDF-Extract-Kit-1.0 Tutorial](https://pdf-extract-kit.readthedocs.io/en/latest/get_started/pretrained_model.html).
128+
123129
> This project focuses on using models for `high-quality` content extraction from `diverse` documents and does not involve reconstructing extracted content into new documents, such as PDF to Markdown. For such needs, please refer to our other GitHub project: [MinerU](https://github.com/opendatalab/MinerU).
124130
125131
## To-Do List

README_zh-CN.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77

88
[English](./README.md) | 简体中文
99

10+
[PDF-Extract-Kit-1.0中文教程](https://pdf-extract-kit.readthedocs.io/zh-cn/latest/get_started/pretrained_model.html)
11+
1012
[[Models (🤗Hugging Face)]](https://huggingface.co/opendatalab/PDF-Extract-Kit) | [[Models(<img src="./assets/readme/modelscope_logo.png" width="20px">ModelScope)]](https://www.modelscope.cn/models/OpenDataLab/PDF-Extract-Kit)
1113

1214
🔥🔥🔥 [MinerU:基于PDF-Extract-Kit的高效文档内容提取工具](https://github.com/opendatalab/MinerU)
@@ -95,7 +97,9 @@ pip install -r requirements.txt
9597
```
9698
> **注意:** 如果你的设备不支持 GPU,请使用 `requirements-cpu.txt` 安装 CPU 版本的依赖。
9799
98-
### 参考[模型下载](models/README.md)下载所需模型权重
100+
### 模型下载
101+
102+
参考[模型权重下载教程](https://pdf-extract-kit.readthedocs.io/zh-cn/latest/get_started/pretrained_model.html)下载所需模型权重。注:可以选择全部下载,也可以选择部分下载,具体操作参考教程。
99103

100104

101105
### Demo运行
@@ -130,6 +134,7 @@ python scripts/formula_recognition.py --config=configs/formula_recognition.yaml
130134
```
131135
你可以在 `outputs/layout_detection` 文件夹下查看公式识别结果。
132136

137+
> **注意:** 更多模型使用细节请查看[PDF-Extract-Kit-1.0 中文教程](https://pdf-extract-kit.readthedocs.io/zh-cn/latest/get_started/pretrained_model.html).
133138
134139
> 本项目专注使用模型对`多样性`文档进行`高质量`内容提取,不涉及提取后内容拼接成新文档,如PDF转Markdown。如果有此类需求,请参考我们另一个Github项目: [MinerU](https://github.com/opendatalab/MinerU)
135140

docs/en/algorithm/layout_detection.rst

Lines changed: 26 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -12,44 +12,20 @@ Layout detection is a fundamental task in document content extraction, aiming to
1212
Model Usage
1313
=================
1414

15-
The layout detection model supports layoutlmv3 and yolov10. Once the environment is set up, you can run the layout detection algorithm script by executing ```scripts/layout_detection.py```.
15+
The layout detection model supports ``YOLOv10``, ``DocLayout-YOLO`` and ``LayoutLMv3``. Once the environment is set up, you can run the layout detection algorithm script by executing ``scripts/layout_detection.py``.
1616

17-
**1. layoutlmv3**
17+
**Run demo**
1818

1919
.. code:: shell
2020
21-
$ python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml
22-
23-
**2. yolov10**
24-
25-
.. code:: shell
26-
27-
$ python scripts/layout_detection.py --config configs/layout_detection_yolo.yaml
21+
$ python scripts/layout_detection.py --config configs/layout_detection.yaml
2822
2923
Model Configuration
3024
-----------------
3125

32-
**1. layoutlmv3**
33-
34-
.. code:: yaml
35-
36-
inputs: assets/demo/layout_detection
37-
outputs: outputs/layout_detection
38-
tasks:
39-
layout_detection:
40-
model: layout_detection_layoutlmv3
41-
model_config:
42-
model_path: path/to/layoutlmv3_model
43-
44-
- inputs/outputs: Define the input file path and the directory for visualization output.
45-
- tasks: Define the task type, currently only a layout detection task is included.
46-
- model: Specify the specific model type, e.g., layout_detection_layoutlmv3.
47-
- model_config: Define the model configuration.
48-
- model_path: Path to the model weights.
49-
50-
**2. yolov10**
26+
**1. yolov10**
5127

52-
Compared to layoutlmv3, yolov10 has faster inference speed and supports batch mode inference.
28+
Compared to LayoutLMv3, YOLOv10 has faster inference speed and supports batch mode inference.
5329

5430
.. code:: yaml
5531
@@ -80,6 +56,27 @@ Compared to layoutlmv3, yolov10 has faster inference speed and supports batch mo
8056
- visualize: Whether to visualize the model results; visualized results will be saved in the outputs directory.
8157
- rect: Whether to enable rectangular inference, default is True. If set to True, images in the same batch will be scaled while maintaining aspect ratio and padded to the same size; if False, all images in the same batch will be resized to (img_size, img_size) for inference.
8258

59+
60+
**2. layoutlmv3**
61+
62+
.. code:: yaml
63+
64+
inputs: assets/demo/layout_detection
65+
outputs: outputs/layout_detection
66+
tasks:
67+
layout_detection:
68+
model: layout_detection_layoutlmv3
69+
model_config:
70+
model_path: path/to/layoutlmv3_model
71+
72+
- inputs/outputs: Define the input file path and the directory for visualization output.
73+
- tasks: Define the task type, currently only a layout detection task is included.
74+
- model: Specify the specific model type, e.g., layout_detection_layoutlmv3.
75+
- model_config: Define the model configuration.
76+
- model_path: Path to the model weights.
77+
78+
79+
8380
Diverse Input Support
8481
-----------------
8582

docs/en/get_started/installation.rst

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,22 +7,27 @@ In this section, we will demonstrate how to install PDF-Extract-Kit.
77
Best Practices
88
==============
99

10-
We recommend users follow our best practices to install PDF-Extract-Kit.
11-
It is recommended to use a Python 3.10 conda virtual environment to install PDF-Extract-Kit.
10+
We recommend users follow our best practices for installing PDF-Extract-Kit. It is recommended to use a Python 3.10 conda virtual environment for the installation.
1211

13-
**Step 1.** Use conda to create a Python 3.10 virtual environment
12+
**Step 1.** Create a Python 3.10 virtual environment using conda.
1413

1514
.. code-block:: console
1615
1716
$ conda create -n pdf-extract-kit-1.0 python=3.10 -y
1817
$ conda activate pdf-extract-kit-1.0
1918
20-
**Step 2.** Install the dependencies for PDF-Extract-Kit
19+
**Step 2.** Install the dependencies for PDF-Extract-Kit.
2120

2221
.. code-block:: console
23-
22+
$ # For GPU devices
2423
$ pip install -r requirements.txt
24+
$ # For CPU-only devices
25+
$ pip install -r requirements-cpu.txt
2526
2627
.. note::
27-
28-
If your device does not support GPU, please install the CPU version dependencies using ``requirements-cpu.txt`` instead of requirements.txt
28+
For the convenience of user environment configuration, requirements.txt only includes the environment needed for the current best models, which currently include:
29+
- Layout Detection: YOLO series (YOLOv10, DocLayout-YOLO)
30+
- Formula Detection: YOLO series (YOLOv8)
31+
- Formula Recognition: UniMERNet
32+
- OCR: PaddleOCR
33+
For other models, such as LayoutLMv3, additional environment setup is required. For details, see \ :ref:`Layout Detection Algorithms <algorithm_layout_detection>`.

docs/en/get_started/quickstart.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Layout Detection Example
99

1010
Layout detection offers several models: ``LayoutLMv3``, ``YOLOv10``, and ``DocLayout-YOLO``. Compared to ``LayoutLMv3``, ``YOLOv10`` is faster. ``DocLayout-YOLO`` is based on YOLOv10 and includes diverse document pre-training and model optimization, offering both speed and high accuracy.
1111

12-
**1. Using Layout Detection Models
12+
**1. Using Layout Detection Models**
1313

1414
.. code-block:: console
1515
Binary file not shown.
1 Byte
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

docs/zh_cn/_build/html/_sources/algorithm/layout_detection.rst

Lines changed: 28 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -7,49 +7,26 @@
77
简介
88
=================
99

10-
布局检测是文档内容提取的基础任务,目标对页面中不同类型的区域进行定位:如图像、表格、文本、标题等,方便后续高质量内容提取。对于文本、标题等区域,可以基于OCR模型进行文字识别,对于表格区域可以基于表格识别模型进行转换。
10+
``布局检测`` 是文档内容提取的基础任务,目标对页面中不同类型的区域进行定位:如 ``图像``、 ``表格``、 ``文本``、 ``标题``等,方便后续高质量内容提取。对于 ``文本``、 ``标题``等区域,可以基于 ``OCR模型``进行文字识别,对于表格区域可以基于表格识别模型进行转换。
1111
1212
模型使用
1313
=================
1414
15-
布局检测模型支持layoutlmv3以及yolov10,在配置好环境的情况下,直接执行```scripts/layout_detection.py```即可运行布局检测算法脚本。
15+
布局检测模型支持 ``YOLOv10``, ``DocLayout-YOLO``和 ``LayoutLMv3``,在配置好环境的情况下,直接执行 ``scripts/layout_detection.py`` 即可运行布局检测算法脚本。
1616

17-
**1. layoutlmv3**
18-
19-
.. code:: shell
20-
21-
$ python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml
2217

23-
**2. yolov10**
18+
**执行布局检测程序**
2419

2520
.. code:: shell
2621
27-
$ python scripts/layout_detection.py --config configs/layout_detection_yolo.yaml
22+
$ python scripts/layout_detection.py --config configs/layout_detection.yaml
2823
2924
模型配置
3025
-----------------
3126

32-
**1. layoutlmv3**
33-
34-
.. code:: yaml
35-
36-
inputs: assets/demo/layout_detection
37-
outputs: outputs/layout_detection
38-
tasks:
39-
layout_detection:
40-
model: layout_detection_layoutlmv3
41-
model_config:
42-
model_path: path/to/layoutlmv3_model
43-
44-
- inputs/outputs: 分别定义输入文件路径和可视化输出目录
45-
- tasks: 定义任务类型,当前只包含一个布局检测任务
46-
- model: 定义具体模型类型,例如layout_detection_layoutlmv3
47-
- model_config: 定义模型配置
48-
- model_path: 模型权重路径
49-
50-
**2. yolov10**
27+
**1. YOLOv10**
5128

52-
和layoutlmv3相比,yolov10推理速度更快,支持batch模式推理
29+
和LayoutLMv3相比,YOLOv10推理速度更快,支持batch模式推理
5330

5431
.. code:: yaml
5532
@@ -70,7 +47,7 @@
7047
7148
- inputs/outputs: 分别定义输入文件路径和可视化输出目录
7249
- tasks: 定义任务类型,当前只包含一个布局检测任务
73-
- model: 定义具体模型类型,例如layout_detection_yolo
50+
- model: 定义具体模型类型,例如 ``layout_detection_yolo``
7451
- model_config: 定义模型配置
7552
- img_size: 定义图像长边大小,短边会根据长边等比例缩放,默认长边保持1280
7653
- conf_thres: 定义置信度阈值,仅检测大于该阈值的目标
@@ -81,10 +58,30 @@
8158
- rect: 是否开启rectangular推理,默认为True。若设为True,同一batch中的图像会保持长宽比进行缩放并且padding到同一尺寸;若为False,同一batch中所有图像都resize到(img_size, img_size)尺寸进行推理
8259
- visualize: 是否对模型结果进行可视化,可视化结果会保存在outputs目录下
8360

61+
62+
**2. LayoutLMv3**
63+
64+
.. code:: yaml
65+
66+
inputs: assets/demo/layout_detection
67+
outputs: outputs/layout_detection
68+
tasks:
69+
layout_detection:
70+
model: layout_detection_layoutlmv3
71+
model_config:
72+
model_path: path/to/layoutlmv3_model
73+
74+
- inputs/outputs: 分别定义输入文件路径和可视化输出目录
75+
- tasks: 定义任务类型,当前只包含一个布局检测任务
76+
- model: 定义具体模型类型,例如layout_detection_layoutlmv3
77+
- model_config: 定义模型配置
78+
- model_path: 模型权重路径
79+
80+
8481
多样化输入支持
8582
-----------------
8683

87-
PDF-Extract-Kit中的布局检测脚本支持 ``单个图像``、 ``只包含图像文件的目录``、 ``单个PDF文件``、 ``只包含PDF文件的目录``等输入形式。
84+
PDF-Extract-Kit中的布局检测脚本支持 ``单个图像``、 ``只包含图像文件的目录``、 ``单个PDF文件``、 ``只包含PDF文件的目录`` 等输入形式。
8885

8986
.. note::
9087

docs/zh_cn/_build/html/_sources/get_started/installation.rst

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,15 @@
2020
**步骤 2.** 安装 PDF-Extract-Kit 的依赖项
2121

2222
.. code-block:: console
23-
23+
$ # 对于GPU设备
2424
$ pip install -r requirements.txt
25+
$ # 对于无CPU设备
26+
$ pip install -r requirements-cpu.txt
2527
2628
.. note::
27-
28-
如果你的设备不支持 GPU,请使用 ``requirements-cpu.txt`` 安装 CPU 版本的依赖。
29+
考虑到用户环境配置的便捷性,我们在requirements.txt只包含当前最好模型需要的环境,目前包含
30+
- 布局检测:YOLO系列(YOLOv10, DocLayout-YOLO)
31+
- 公式检测:YOLO系列 (YOLOv8)
32+
- 公式识别:UniMERNet
33+
- OCR: PaddleOCR
34+
对于其他模型请,如LayoutLMv3需要单独安装环境,具体见\ :ref:`布局检测算法 <algorithm_layout_detection>`

docs/zh_cn/_build/html/_sources/get_started/pretrained_model.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ HuggingFace
3737

3838
.. code:: console
3939
40-
$ # 默认为 `~/.cache/huggingface/`
40+
$ # 默认为 ~/.cache/huggingface/
4141
$ export HF_HOME=Comming soon!
4242
4343
.. tip::

docs/zh_cn/_build/html/_sources/get_started/quickstart.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@
99
布局检测示例
1010
==============
1111

12-
布局检测提供了多种模型: ``LayoutLMv3``, ``YOLOv10``, ``DocLayout-YOLO``, 相比与 ``LayoutLMv3``, ``YOLOv10``速度更快, ``DocLayout-YOLO``则是基于 ``YOLOv10`` 的基础上进行多样性文档预训练及模型优化,速度快,精度高。
12+
布局检测提供了多种模型: ``LayoutLMv3`` ``YOLOv10`` ``DocLayout-YOLO``, 相比与 ``LayoutLMv3`` ``YOLOv10`` 速度更快, ``DocLayout-YOLO`` 则是基于 ``YOLOv10`` 的基础上进行多样性文档预训练及模型优化,速度快,精度高。
1313

14-
**1. 使用布局检测模型
14+
**1. 使用布局检测模型**
1515

1616
.. code-block:: console
1717

0 commit comments

Comments
 (0)