|
1 | 1 | .. _algorithm_layout_detection:
|
2 | 2 |
|
3 | 3 | =================
|
4 |
| -布局检测算法 |
| 4 | +Layout Detection Algorithm |
5 | 5 | =================
|
6 | 6 |
|
7 |
| -简介 |
| 7 | +Introduction |
8 | 8 | =================
|
9 | 9 |
|
10 |
| -布局检测是文档内容提取的基础任务,目标对页面中不同类型的区域进行定位:如图像、表格、文本、标题等,方便后续高质量内容提取。对于文本、标题等区域,可以基于OCR模型进行文字识别,对于表格区域可以基于表格识别模型进行转换。 |
| 10 | +Layout detection is a fundamental task in document content extraction, aiming to locate different types of regions on a page, such as images, tables, text, and headings, to facilitate high-quality content extraction. For text and heading regions, OCR models can be used for text recognition, while table regions can be converted using table recognition models. |
11 | 11 |
|
12 |
| -模型使用 |
| 12 | +Model Usage |
13 | 13 | =================
|
14 | 14 |
|
15 |
| -在配置好环境的情况下,直接执行``scripts/layout_detection.py``即可运行布局检测算法脚本。 |
| 15 | +The layout detection model supports layoutlmv3 and yolov10. Once the environment is set up, you can run the layout detection algorithm script by executing ```scripts/layout_detection.py```. |
| 16 | + |
| 17 | +**1. layoutlmv3** |
| 18 | + |
| 19 | +.. code:: shell |
| 20 | +
|
| 21 | + $ python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml |
| 22 | + |
| 23 | +**2. yolov10** |
16 | 24 |
|
17 | 25 | .. code:: shell
|
18 | 26 |
|
19 |
| - $ python scripts/layout_detection.py --config configs/layout_detection.yaml |
| 27 | + $ python scripts/layout_detection.py --config configs/layout_detection_yolo.yaml |
20 | 28 |
|
21 |
| -模型配置 |
| 29 | +Model Configuration |
22 | 30 | -----------------
|
23 | 31 |
|
| 32 | +**1. layoutlmv3** |
| 33 | + |
| 34 | +.. code:: yaml |
| 35 | +
|
| 36 | + inputs: assets/demo/layout_detection |
| 37 | + outputs: outputs/layout_detection |
| 38 | + tasks: |
| 39 | + layout_detection: |
| 40 | + model: layout_detection_layoutlmv3 |
| 41 | + model_config: |
| 42 | + model_path: path/to/layoutlmv3_model |
| 43 | +
|
| 44 | +- inputs/outputs: Define the input file path and the directory for visualization output. |
| 45 | +- tasks: Define the task type, currently only a layout detection task is included. |
| 46 | +- model: Specify the specific model type, e.g., layout_detection_layoutlmv3. |
| 47 | +- model_config: Define the model configuration. |
| 48 | +- model_path: Path to the model weights. |
| 49 | + |
| 50 | +**2. yolov10** |
| 51 | + |
| 52 | +Compared to layoutlmv3, yolov10 has faster inference speed and supports batch mode inference. |
| 53 | + |
24 | 54 | .. code:: yaml
|
25 | 55 |
|
26 | 56 | inputs: assets/demo/layout_detection
|
27 | 57 | outputs: outputs/layout_detection
|
28 | 58 | tasks:
|
29 |
| - layout_detection: |
30 |
| - model: layout_detection_yolo |
31 |
| - model_config: |
32 |
| - img_size: 1280 |
33 |
| - conf_thres: 0.25 |
34 |
| - iou_thres: 0.45 |
35 |
| - batch_size: 1 |
36 |
| - model_path: models/Layout/yolov8/yolov8_mixed_1600.pt |
37 |
| - visualize: True |
38 |
| -
|
39 |
| -- inputs/outputs: 分别定义输入文件路径和可视化输出目录 |
40 |
| -- tasks: 定义任务类型,当前只包含一个布局检测任务 |
41 |
| -- model: 定义具体模型类型: 如layout_detection_yolo 或者 layout_detection_layoutlmv3 |
42 |
| -- model_config: 定义模型配置 |
43 |
| -- img_size: 定义图像长边大小,短边会根据长边等比例缩放 |
44 |
| -- conf_thres: 定义置信度阈值,仅检测大于该阈值的目标 |
45 |
| -- iou_thres: 定义IoU阈值,去除重叠度大于该阈值的目标 |
46 |
| -- batch_size: 定义批量大小,推理时每次同时推理的图像数,一般情况下越大推理速度越快,显卡越好该数值可以设置的越大 |
47 |
| -- model_path: 模型权重路径 |
48 |
| -- visualize: 是否对模型结果进行可视化,可视化结果会保存在outputs目录下。 |
49 |
| - |
50 |
| -多样化输入支持 |
| 59 | + layout_detection: |
| 60 | + model: layout_detection_yolo |
| 61 | + model_config: |
| 62 | + img_size: 1280 |
| 63 | + conf_thres: 0.25 |
| 64 | + iou_thres: 0.45 |
| 65 | + batch_size: 2 |
| 66 | + model_path: path/to/yolov10_model |
| 67 | + visualize: True |
| 68 | + rect: True |
| 69 | + device: "0" |
| 70 | +
|
| 71 | +- inputs/outputs: Define the input file path and the directory for visualization output. |
| 72 | +- tasks: Define the task type, currently only a layout detection task is included. |
| 73 | +- model: Specify the specific model type, e.g., layout_detection_yolo. |
| 74 | +- model_config: Define the model configuration. |
| 75 | +- img_size: Define the image long edge size; the short edge will be scaled proportionally based on the long edge, with the default long edge being 1280. |
| 76 | +- conf_thres: Define the confidence threshold, detecting only targets above this threshold. |
| 77 | +- iou_thres: Define the IoU threshold, removing targets with an overlap greater than this threshold. |
| 78 | +- batch_size: Define the batch size, the number of images inferred simultaneously during inference. Generally, the larger the batch size, the faster the inference speed; a better GPU allows for a larger batch size. |
| 79 | +- model_path: Path to the model weights. |
| 80 | +- visualize: Whether to visualize the model results; visualized results will be saved in the outputs directory. |
| 81 | +- rect: Whether to enable rectangular inference, default is True. If set to True, images in the same batch will be scaled while maintaining aspect ratio and padded to the same size; if False, all images in the same batch will be resized to (img_size, img_size) for inference. |
| 82 | + |
| 83 | +Diverse Input Support |
51 | 84 | -----------------
|
52 | 85 |
|
53 |
| -PDF-Extract-Kit中的布局检测脚本支持 ``单个图像``、 ``只包含图像文件的目录``、 ``单个PDF文件``、 ``只包含PDF文件的目录``等输入形式。 |
| 86 | +The layout detection script in PDF-Extract-Kit supports input formats such as a ``single image``, a ``directory containing only image files``, a ``single PDF file``, and a ``directory containing only PDF files``. |
54 | 87 |
|
55 | 88 | .. note::
|
56 | 89 |
|
57 |
| - 根据自己实际数据形式,修改configs/layout_detection.yaml中inputs的路径即可 |
58 |
| - - 单个图像: path/to/image |
59 |
| - - 图像文件夹: path/to/images |
60 |
| - - 单个PDF文件: path/to/pdf |
61 |
| - - PDF文件夹: path/to/pdfs |
| 90 | + Modify the path to inputs in configs/layout_detection.yaml according to your actual data format: |
| 91 | + - Single image: path/to/image |
| 92 | + - Image directory: path/to/images |
| 93 | + - Single PDF file: path/to/pdf |
| 94 | + - PDF directory: path/to/pdfs |
62 | 95 |
|
63 | 96 | .. note::
|
64 |
| - 当使用PDF作为输入时,需要将 ``formula_detection.py`` |
| 97 | + When using PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``formula_detection.py``. |
65 | 98 |
|
66 | 99 | .. code:: python
|
67 | 100 |
|
68 | 101 | # for image detection
|
69 | 102 | detection_results = model_layout_detection.predict_images(input_data, result_path)
|
70 | 103 |
|
71 |
| - 中的 ``predict_images``修改为 ``predict_pdfs``。 |
| 104 | + Change to: |
72 | 105 |
|
73 | 106 | .. code:: python
|
74 | 107 |
|
75 | 108 | # for pdf detection
|
76 | 109 | detection_results = model_layout_detection.predict_pdfs(input_data, result_path)
|
77 | 110 |
|
78 |
| -可视化结果查看 |
| 111 | +Viewing Visualization Results |
79 | 112 | -----------------
|
80 | 113 |
|
81 |
| -当config文件中 ``visualize`` 设置为 ``True`` 时,可视化结果会保存在 ``outputs`` 目录下。 |
| 114 | +When ``visualize`` is set to ``True`` in the config file, the visualization results will be saved in the ``outputs`` directory. |
82 | 115 |
|
83 | 116 | .. note::
|
84 | 117 |
|
85 |
| - 可视化可以方便对模型结果进行分析,但当进行大批量任务时,建议关掉可视化(设置 ``visualize``为 ``False``),减少内存和磁盘占用。 |
| 118 | + Visualization is helpful for analyzing model results, but for large-scale tasks, it is recommended to turn off visualization (set ``visualize`` to ``False``) to reduce memory and disk usage. |
0 commit comments