Skip to content

Commit a361e4c

Browse files
committed
Updata docs
1 parent c3223ef commit a361e4c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+515
-409
lines changed

docs/en/.readthedocs copy.yaml

Lines changed: 0 additions & 16 deletions
This file was deleted.

docs/en/_static/image/logo.png

57.9 KB
Loading
Lines changed: 30 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,28 @@
11
.. _algorithm_formula_detection:
22

33
====================
4-
公式检测算法
4+
Formula Detection Algorithm
55
====================
66

7-
简介
7+
Introduction
88
====================
99

10-
公式检测是针对给定的输入图像,检测出图像中所有包含公式的位置(包含行内公式和行间公式)
10+
Formula detection involves identifying the positions of all formulas (including inline and block formulas) in a given input image.
1111

1212
.. note::
1313

14-
公式检测实际上属于布局检测子任务,但由于公式检查的复杂性,我们建议使用单独的公式检测模型解耦。
15-
这样通常使得数据标注更加方便,且公式检测效果也更好。
14+
Formula detection is technically a subtask of layout detection. However, due to its complexity, we recommend using a dedicated formula detection model to decouple it. This approach typically makes data annotation easier and improves detection performance.
1615

17-
模型使用
16+
Model Usage
1817
====================
1918

20-
在配置好环境的情况下,直接执行 ``scripts/formula_detection.py`` 即可运行布局检测算法脚本。
19+
With the environment properly set up, simply run the layout detection algorithm script by executing ``scripts/formula_detection.py``.
2120

2221
.. code:: shell
2322
2423
$ python scripts/formula_detection.py --config configs/formula_detection.yaml
2524
26-
模型配置
25+
Model Configuration
2726
--------------------
2827

2928
.. code:: yaml
@@ -41,52 +40,52 @@
4140
model_path: models/MFD/yolov8/weights.pt
4241
visualize: True
4342
44-
- inputs/outputs: 分别定义输入文件路径和可视化输出目录
45-
- tasks: 定义任务类型,当前只包含一个公式检测任务
46-
- model: 定义具体模型类型: 当前仅提供YOLO公式检测模型
47-
- model_config: 定义模型配置
48-
- img_size: 定义图像长边大小,短边会根据长边等比例缩放
49-
- conf_thres: 定义置信度阈值,仅检测大于该阈值的目标
50-
- iou_thres: 定义IoU阈值,去除重叠度大于该阈值的目标
51-
- batch_size: 定义批量大小,推理时每次同时推理的图像数,一般情况下越大推理速度越快,显卡越好该数值可以设置的越大
52-
- model_path: 模型权重路径
53-
- visualize: 是否对模型结果进行可视化,可视化结果会保存在outputs目录下。
54-
55-
多样化输入支持
43+
- inputs/outputs: Define the input file path and the visualization output directory, respectively.
44+
- tasks: Define the task type, currently only a formula detection task is included.
45+
- model: Define the specific model type: currently, only the YOLO formula detection model is available.
46+
- model_config: Define the model configuration.
47+
- img_size: Define the image's longer side size; the shorter side will be scaled proportionally.
48+
- conf_thres: Define the confidence threshold; only targets above this threshold will be detected.
49+
- iou_thres: Define the IoU threshold to remove targets with an overlap greater than this value.
50+
- batch_size: Define the batch size; the number of images inferred simultaneously. Generally, the larger the batch size, the faster the inference speed. A better GPU allows for a larger batch size.
51+
- model_path: Path to the model weights.
52+
- visualize: Whether to visualize the model results. Visualized results will be saved in the outputs directory.
53+
54+
Diverse Input Support
5655
--------------------
5756

58-
PDF-Extract-Kit中的公式检测脚本支持 ``单个图像`` 、 ``只包含图像文件的目录`` 、 ``单个PDF文件`` 、 ``只包含PDF文件的目录`` 等输入形式。
57+
The formula detection script in PDF-Extract-Kit supports various input formats such as ``a single image``, ``a directory of image files``, ``a single PDF file``, and ``a directory of PDF files``.
5958

6059
.. note::
6160

62-
根据自己实际数据形式,修改configs/formula_detection.yaml中inputs的路径即可
63-
- 单个图像: path/to/image
64-
- 图像文件夹: path/to/images
65-
- 单个PDF文件: path/to/pdf
66-
- PDF文件夹: path/to/pdfs
61+
Modify the ``inputs`` path in ``configs/formula_detection.yaml`` according to your actual data format:
62+
- Single image: path/to/image
63+
- Image directory: path/to/images
64+
- Single PDF file: path/to/pdf
65+
- PDF directory: path/to/pdfs
6766

6867
.. note::
6968

70-
当使用PDF作为输入时,需要将 ``formula_detection.py ``
69+
When using a PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``formula_detection.py``.
7170

7271
.. code:: python
7372
7473
# for image detection
7574
detection_results = model_formula_detection.predict_images(input_data, result_path)
7675
77-
中的 ``predict_images`` 修改为 ``predict_pdfs``。
76+
Change to:
7877

7978
.. code:: python
8079
8180
# for pdf detection
8281
detection_results = model_formula_detection.predict_pdfs(input_data, result_path)
8382
8483
85-
可视化结果查看
84+
Viewing Visualization Results
8685
--------------------
8786

88-
当config文件中 ``visualize`` 设置为 ``True`` 时,可视化结果会保存在 ``outputs/formula_detection`` 目录下。
87+
When the ``visualize`` option in the config file is set to ``True``, visualization results will be saved in the ``outputs/formula_detection`` directory.
8988

9089
.. note::
9190

92-
可视化可以方便对模型结果进行分析,但当进行大批量任务时,建议关掉可视化(设置 ``visualize`` ``False``),减少内存和磁盘占用。
91+
Visualization facilitates the analysis of model results. However, for large-scale tasks, it is recommended to disable visualization (set ``visualize`` to ``False``) to reduce memory and disk usage.
Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
11
.. _algorithm_formula_recognition:
22

33
============
4-
公式识别算法
4+
Formula Recognition Algorithm
55
============
66

7-
简介
7+
Introduction
88
=================
99

10-
公式检测是指给定输入公式图像,识别公式图像内容并转为 ``LaTeX`` 格式。
10+
Formula detection involves recognizing the content of a given input formula image and converting it to ``LaTeX`` format.
1111

12-
模型使用
12+
Model Usage
1313
=================
1414

15-
在配置好环境的情况下,直接执行 ``scripts/formula_recognition.py`` 即可运行布局检测算法脚本。
15+
With the environment properly configured, you can run the layout detection algorithm script by executing ``scripts/formula_recognition.py``.
1616

1717
.. code:: shell
1818
1919
$ python scripts/formula_recognition.py --config configs/formula_recognition.yaml
2020
21-
模型配置
21+
Model Configuration
2222
-----------------
2323

2424
.. code:: yaml
@@ -33,20 +33,20 @@
3333
model_path: models/MFR/unimernet_tiny
3434
visualize: False
3535
36-
- inputs/outputs: 分别定义输入文件路径和LaTeX预测结果目录
37-
- tasks: 定义任务类型,当前只包含一个公式识别任务
38-
- model: 定义具体模型类型: 当前仅提供 `UniMERNet <https://github.com/opendatalab/UniMERNet>`_ 公式识别模型
39-
- model_config: 定义模型配置
40-
- cfg_path: UniMERNet配置文件路径
41-
- model_path: 模型权重路径
42-
- visualize: 是否对模型结果进行可视化,可视化结果会保存在outputs目录下。
36+
- inputs/outputs: Define the input file path and the directory for LaTeX prediction results, respectively.
37+
- tasks: Define the task type, currently only containing a formula recognition task.
38+
- model: Define the specific model type: Currently, only the `UniMERNet <https://github.com/opendatalab/UniMERNet>`_ formula recognition model is provided.
39+
- model_config: Define the model configuration.
40+
- cfg_path: Path to the UniMERNet configuration file.
41+
- model_path: Path to the model weights.
42+
- visualize: Whether to visualize the model results. Visualized results will be saved in the outputs directory.
4343

44-
多样化输入支持
44+
Support for Diverse Inputs
4545
-----------------
4646

47-
PDF-Extract-Kit中的公式检测脚本支持 ``单个公式图像``、 ``文档图像及对应公式区域``
47+
The formula detection script in PDF-Extract-Kit supports ``single formula images`` and ``document images with corresponding formula regions``.
4848

49-
可视化结果查看
49+
Viewing Visualization Results
5050
-----------------
5151

52-
当config文件中visualize设置为True时, ``LaTeX`` 预测结果会保存在outputs目录下。
52+
When the visualize setting in the config file is set to True, ``LaTeX`` prediction results will be saved in the outputs directory.
Lines changed: 73 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,85 +1,118 @@
11
.. _algorithm_layout_detection:
22

33
=================
4-
布局检测算法
4+
Layout Detection Algorithm
55
=================
66

7-
简介
7+
Introduction
88
=================
99

10-
布局检测是文档内容提取的基础任务,目标对页面中不同类型的区域进行定位:如图像、表格、文本、标题等,方便后续高质量内容提取。对于文本、标题等区域,可以基于OCR模型进行文字识别,对于表格区域可以基于表格识别模型进行转换。
10+
Layout detection is a fundamental task in document content extraction, aiming to locate different types of regions on a page, such as images, tables, text, and headings, to facilitate high-quality content extraction. For text and heading regions, OCR models can be used for text recognition, while table regions can be converted using table recognition models.
1111

12-
模型使用
12+
Model Usage
1313
=================
1414

15-
在配置好环境的情况下,直接执行``scripts/layout_detection.py``即可运行布局检测算法脚本。
15+
The layout detection model supports layoutlmv3 and yolov10. Once the environment is set up, you can run the layout detection algorithm script by executing ```scripts/layout_detection.py```.
16+
17+
**1. layoutlmv3**
18+
19+
.. code:: shell
20+
21+
$ python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml
22+
23+
**2. yolov10**
1624

1725
.. code:: shell
1826
19-
$ python scripts/layout_detection.py --config configs/layout_detection.yaml
27+
$ python scripts/layout_detection.py --config configs/layout_detection_yolo.yaml
2028
21-
模型配置
29+
Model Configuration
2230
-----------------
2331

32+
**1. layoutlmv3**
33+
34+
.. code:: yaml
35+
36+
inputs: assets/demo/layout_detection
37+
outputs: outputs/layout_detection
38+
tasks:
39+
layout_detection:
40+
model: layout_detection_layoutlmv3
41+
model_config:
42+
model_path: path/to/layoutlmv3_model
43+
44+
- inputs/outputs: Define the input file path and the directory for visualization output.
45+
- tasks: Define the task type, currently only a layout detection task is included.
46+
- model: Specify the specific model type, e.g., layout_detection_layoutlmv3.
47+
- model_config: Define the model configuration.
48+
- model_path: Path to the model weights.
49+
50+
**2. yolov10**
51+
52+
Compared to layoutlmv3, yolov10 has faster inference speed and supports batch mode inference.
53+
2454
.. code:: yaml
2555
2656
inputs: assets/demo/layout_detection
2757
outputs: outputs/layout_detection
2858
tasks:
29-
layout_detection:
30-
model: layout_detection_yolo
31-
model_config:
32-
img_size: 1280
33-
conf_thres: 0.25
34-
iou_thres: 0.45
35-
batch_size: 1
36-
model_path: models/Layout/yolov8/yolov8_mixed_1600.pt
37-
visualize: True
38-
39-
- inputs/outputs: 分别定义输入文件路径和可视化输出目录
40-
- tasks: 定义任务类型,当前只包含一个布局检测任务
41-
- model: 定义具体模型类型: 如layout_detection_yolo 或者 layout_detection_layoutlmv3
42-
- model_config: 定义模型配置
43-
- img_size: 定义图像长边大小,短边会根据长边等比例缩放
44-
- conf_thres: 定义置信度阈值,仅检测大于该阈值的目标
45-
- iou_thres: 定义IoU阈值,去除重叠度大于该阈值的目标
46-
- batch_size: 定义批量大小,推理时每次同时推理的图像数,一般情况下越大推理速度越快,显卡越好该数值可以设置的越大
47-
- model_path: 模型权重路径
48-
- visualize: 是否对模型结果进行可视化,可视化结果会保存在outputs目录下。
49-
50-
多样化输入支持
59+
layout_detection:
60+
model: layout_detection_yolo
61+
model_config:
62+
img_size: 1280
63+
conf_thres: 0.25
64+
iou_thres: 0.45
65+
batch_size: 2
66+
model_path: path/to/yolov10_model
67+
visualize: True
68+
rect: True
69+
device: "0"
70+
71+
- inputs/outputs: Define the input file path and the directory for visualization output.
72+
- tasks: Define the task type, currently only a layout detection task is included.
73+
- model: Specify the specific model type, e.g., layout_detection_yolo.
74+
- model_config: Define the model configuration.
75+
- img_size: Define the image long edge size; the short edge will be scaled proportionally based on the long edge, with the default long edge being 1280.
76+
- conf_thres: Define the confidence threshold, detecting only targets above this threshold.
77+
- iou_thres: Define the IoU threshold, removing targets with an overlap greater than this threshold.
78+
- batch_size: Define the batch size, the number of images inferred simultaneously during inference. Generally, the larger the batch size, the faster the inference speed; a better GPU allows for a larger batch size.
79+
- model_path: Path to the model weights.
80+
- visualize: Whether to visualize the model results; visualized results will be saved in the outputs directory.
81+
- rect: Whether to enable rectangular inference, default is True. If set to True, images in the same batch will be scaled while maintaining aspect ratio and padded to the same size; if False, all images in the same batch will be resized to (img_size, img_size) for inference.
82+
83+
Diverse Input Support
5184
-----------------
5285

53-
PDF-Extract-Kit中的布局检测脚本支持 ``单个图像``、 ``只包含图像文件的目录``、 ``单个PDF文件``、 ``只包含PDF文件的目录``等输入形式。
86+
The layout detection script in PDF-Extract-Kit supports input formats such as a ``single image``, a ``directory containing only image files``, a ``single PDF file``, and a ``directory containing only PDF files``.
5487

5588
.. note::
5689

57-
根据自己实际数据形式,修改configs/layout_detection.yaml中inputs的路径即可
58-
- 单个图像: path/to/image
59-
- 图像文件夹: path/to/images
60-
- 单个PDF文件: path/to/pdf
61-
- PDF文件夹: path/to/pdfs
90+
Modify the path to inputs in configs/layout_detection.yaml according to your actual data format:
91+
- Single image: path/to/image
92+
- Image directory: path/to/images
93+
- Single PDF file: path/to/pdf
94+
- PDF directory: path/to/pdfs
6295

6396
.. note::
64-
当使用PDF作为输入时,需要将 ``formula_detection.py``
97+
When using PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``formula_detection.py``.
6598

6699
.. code:: python
67100
68101
# for image detection
69102
detection_results = model_layout_detection.predict_images(input_data, result_path)
70103
71-
中的 ``predict_images``修改为 ``predict_pdfs``。
104+
Change to:
72105

73106
.. code:: python
74107
75108
# for pdf detection
76109
detection_results = model_layout_detection.predict_pdfs(input_data, result_path)
77110
78-
可视化结果查看
111+
Viewing Visualization Results
79112
-----------------
80113

81-
当config文件中 ``visualize`` 设置为 ``True`` 时,可视化结果会保存在 ``outputs`` 目录下。
114+
When ``visualize`` is set to ``True`` in the config file, the visualization results will be saved in the ``outputs`` directory.
82115

83116
.. note::
84117

85-
可视化可以方便对模型结果进行分析,但当进行大批量任务时,建议关掉可视化(设置 ``visualize````False``),减少内存和磁盘占用。
118+
Visualization is helpful for analyzing model results, but for large-scale tasks, it is recommended to turn off visualization (set ``visualize`` to ``False``) to reduce memory and disk usage.

docs/en/algorithm/ocr.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
.. _algorithm_ocr:
22
==========================
3-
光学字符识别(OCR)算法
3+
OCR (Optical Character Recognition) Algorithm
44
==========================
55
Comming soon.

docs/en/algorithm/reading_order.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _algorithm_reading_oder:
22
==============
3-
阅读顺序算法
3+
Reading Order Algorithm
44
==============
55

66
Comming soon.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _algorithm_table_recognition:
22
=================
3-
表格识别算法
3+
Table Recognition Algorithm
44
=================
55

66
Comming soon.

0 commit comments

Comments
 (0)