opendatalab
diff --git a/‎docs/en/.readthedocs copy.yaml
Lines changed: 0 additions & 16 deletions b/‎docs/en/.readthedocs copy.yaml
Lines changed: 0 additions & 16 deletions
diff --git a/‎docs/en/_static/image/logo.png
57.9 KB b/‎docs/en/_static/image/logo.png
57.9 KB
diff --git a/‎docs/en/algorithm/formula_detection.rst
Lines changed: 30 additions & 31 deletions b/‎docs/en/algorithm/formula_detection.rst
Lines changed: 30 additions & 31 deletions
diff --git a/‎docs/en/algorithm/formula_recognition.rst
Lines changed: 17 additions & 17 deletions b/‎docs/en/algorithm/formula_recognition.rst
Lines changed: 17 additions & 17 deletions
diff --git a/‎docs/en/algorithm/layout_detection.rst
Lines changed: 73 additions & 40 deletions b/‎docs/en/algorithm/layout_detection.rst
Lines changed: 73 additions & 40 deletions
diff --git a/‎docs/en/algorithm/ocr.rst
Lines changed: 1 addition & 1 deletion b/‎docs/en/algorithm/ocr.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/en/algorithm/reading_order.rst
Lines changed: 1 addition & 1 deletion b/‎docs/en/algorithm/reading_order.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/en/algorithm/table_recognition.rst
Lines changed: 1 addition & 1 deletion b/‎docs/en/algorithm/table_recognition.rst
Lines changed: 1 addition & 1 deletion
@@ -1,29 +1,28 @@
 ..  _algorithm_formula_detection:
 
 ====================
-公式检测算法
+Formula Detection Algorithm
 ====================
 
-简介
+Introduction
 ====================
 
-公式检测是针对给定的输入图像，检测出图像中所有包含公式的位置（包含行内公式和行间公式）
+Formula detection involves identifying the positions of all formulas (including inline and block formulas) in a given input image.
 
 .. note::
 
-   公式检测实际上属于布局检测子任务，但由于公式检查的复杂性，我们建议使用单独的公式检测模型解耦。
-   这样通常使得数据标注更加方便，且公式检测效果也更好。
+   Formula detection is technically a subtask of layout detection. However, due to its complexity, we recommend using a dedicated formula detection model to decouple it. This approach typically makes data annotation easier and improves detection performance.
 
-模型使用
+Model Usage
 ====================
 
-在配置好环境的情况下，直接执行 ``scripts/formula_detection.py`` 即可运行布局检测算法脚本。
+With the environment properly set up, simply run the layout detection algorithm script by executing ``scripts/formula_detection.py``.
 
 .. code:: shell
 
    $ python scripts/formula_detection.py --config configs/formula_detection.yaml
 
-模型配置
+Model Configuration
 --------------------
 
 .. code:: yaml
@@ -41,52 +40,52 @@
             model_path: models/MFD/yolov8/weights.pt
             visualize: True
 
-- inputs/outputs: 分别定义输入文件路径和可视化输出目录
-- tasks: 定义任务类型，当前只包含一个公式检测任务
-- model: 定义具体模型类型: 当前仅提供YOLO公式检测模型
-- model_config: 定义模型配置
-- img_size: 定义图像长边大小，短边会根据长边等比例缩放
-- conf_thres: 定义置信度阈值，仅检测大于该阈值的目标
-- iou_thres: 定义IoU阈值，去除重叠度大于该阈值的目标
-- batch_size: 定义批量大小，推理时每次同时推理的图像数，一般情况下越大推理速度越快，显卡越好该数值可以设置的越大
-- model_path: 模型权重路径
-- visualize: 是否对模型结果进行可视化，可视化结果会保存在outputs目录下。
-
-多样化输入支持
+- inputs/outputs: Define the input file path and the visualization output directory, respectively.
+- tasks: Define the task type, currently only a formula detection task is included.
+- model: Define the specific model type: currently, only the YOLO formula detection model is available.
+- model_config: Define the model configuration.
+- img_size: Define the image's longer side size; the shorter side will be scaled proportionally.
+- conf_thres: Define the confidence threshold; only targets above this threshold will be detected.
+- iou_thres: Define the IoU threshold to remove targets with an overlap greater than this value.
+- batch_size: Define the batch size; the number of images inferred simultaneously. Generally, the larger the batch size, the faster the inference speed. A better GPU allows for a larger batch size.
+- model_path: Path to the model weights.
+- visualize: Whether to visualize the model results. Visualized results will be saved in the outputs directory.
+
+Diverse Input Support
 --------------------
 
-PDF-Extract-Kit中的公式检测脚本支持 ``单个图像`` 、 ``只包含图像文件的目录`` 、 ``单个PDF文件`` 、 ``只包含PDF文件的目录`` 等输入形式。
+The formula detection script in PDF-Extract-Kit supports various input formats such as ``a single image``, ``a directory of image files``, ``a single PDF file``, and ``a directory of PDF files``.
 
 .. note:: 
 
-   根据自己实际数据形式，修改configs/formula_detection.yaml中inputs的路径即可
-   - 单个图像: path/to/image  
-   - 图像文件夹: path/to/images  
-   - 单个PDF文件: path/to/pdf  
-   - PDF文件夹: path/to/pdfs  
+   Modify the ``inputs`` path in ``configs/formula_detection.yaml`` according to your actual data format:
+   - Single image: path/to/image  
+   - Image directory: path/to/images  
+   - Single PDF file: path/to/pdf  
+   - PDF directory: path/to/pdfs  
 
 .. note::
 
-   当使用PDF作为输入时，需要将 ``formula_detection.py ``
+   When using a PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``formula_detection.py``.
 
    .. code:: python
 
       # for image detection
       detection_results = model_formula_detection.predict_images(input_data, result_path)
    
-   中的 ``predict_images`` 修改为 ``predict_pdfs``。
+   Change to:
 
    .. code:: python
 
       # for pdf detection
       detection_results = model_formula_detection.predict_pdfs(input_data, result_path)
 
 
-可视化结果查看
+Viewing Visualization Results
 --------------------
 
-当config文件中 ``visualize`` 设置为 ``True`` 时，可视化结果会保存在 ``outputs/formula_detection`` 目录下。
+When the ``visualize`` option in the config file is set to ``True``, visualization results will be saved in the ``outputs/formula_detection`` directory.
 
 .. note::
 
-   可视化可以方便对模型结果进行分析，但当进行大批量任务时，建议关掉可视化(设置 ``visualize`` 为 ``False``)，减少内存和磁盘占用。
+   Visualization facilitates the analysis of model results. However, for large-scale tasks, it is recommended to disable visualization (set ``visualize`` to ``False``) to reduce memory and disk usage.
@@ -1,24 +1,24 @@
 ..  _algorithm_formula_recognition:
 
 ============
-公式识别算法
+Formula Recognition Algorithm
 ============
 
-简介
+Introduction
 =================
 
-公式检测是指给定输入公式图像，识别公式图像内容并转为 ``LaTeX`` 格式。
+Formula detection involves recognizing the content of a given input formula image and converting it to ``LaTeX`` format.
 
-模型使用
+Model Usage
 =================
 
-在配置好环境的情况下，直接执行 ``scripts/formula_recognition.py`` 即可运行布局检测算法脚本。
+With the environment properly configured, you can run the layout detection algorithm script by executing ``scripts/formula_recognition.py``.
 
 .. code:: shell
 
    $ python scripts/formula_recognition.py --config configs/formula_recognition.yaml
 
-模型配置
+Model Configuration
 -----------------
 
 .. code:: yaml
@@ -33,20 +33,20 @@
             model_path: models/MFR/unimernet_tiny
             visualize: False
 
-- inputs/outputs: 分别定义输入文件路径和LaTeX预测结果目录
-- tasks: 定义任务类型，当前只包含一个公式识别任务
-- model: 定义具体模型类型: 当前仅提供 `UniMERNet <https://github.com/opendatalab/UniMERNet>`_ 公式识别模型
-- model_config: 定义模型配置
-- cfg_path: UniMERNet配置文件路径
-- model_path: 模型权重路径
-- visualize: 是否对模型结果进行可视化，可视化结果会保存在outputs目录下。
+- inputs/outputs: Define the input file path and the directory for LaTeX prediction results, respectively.
+- tasks: Define the task type, currently only containing a formula recognition task.
+- model: Define the specific model type: Currently, only the `UniMERNet <https://github.com/opendatalab/UniMERNet>`_ formula recognition model is provided.
+- model_config: Define the model configuration.
+- cfg_path: Path to the UniMERNet configuration file.
+- model_path: Path to the model weights.
+- visualize: Whether to visualize the model results. Visualized results will be saved in the outputs directory.
 
-多样化输入支持
+Support for Diverse Inputs
 -----------------
 
-PDF-Extract-Kit中的公式检测脚本支持 ``单个公式图像``、 ``文档图像及对应公式区域``
+The formula detection script in PDF-Extract-Kit supports ``single formula images`` and ``document images with corresponding formula regions``.
 
-可视化结果查看
+Viewing Visualization Results
 -----------------
 
-当config文件中visualize设置为True时， ``LaTeX`` 预测结果会保存在outputs目录下。
+When the visualize setting in the config file is set to True, ``LaTeX`` prediction results will be saved in the outputs directory.
@@ -1,85 +1,118 @@
 .. _algorithm_layout_detection:
 
 =================
-布局检测算法
+Layout Detection Algorithm
 =================
 
-简介
+Introduction
 =================
 
-布局检测是文档内容提取的基础任务，目标对页面中不同类型的区域进行定位：如图像、表格、文本、标题等，方便后续高质量内容提取。对于文本、标题等区域，可以基于OCR模型进行文字识别，对于表格区域可以基于表格识别模型进行转换。
+Layout detection is a fundamental task in document content extraction, aiming to locate different types of regions on a page, such as images, tables, text, and headings, to facilitate high-quality content extraction. For text and heading regions, OCR models can be used for text recognition, while table regions can be converted using table recognition models.
 
-模型使用
+Model Usage
 =================
 
-在配置好环境的情况下，直接执行``scripts/layout_detection.py``即可运行布局检测算法脚本。
+The layout detection model supports layoutlmv3 and yolov10. Once the environment is set up, you can run the layout detection algorithm script by executing ```scripts/layout_detection.py```.
+
+**1. layoutlmv3**
+
+.. code:: shell
+
+   $ python scripts/layout_detection.py --config configs/layout_detection_layoutlmv3.yaml
+   
+**2. yolov10**
 
 .. code:: shell
 
-   $ python scripts/layout_detection.py --config configs/layout_detection.yaml
+   $ python scripts/layout_detection.py --config configs/layout_detection_yolo.yaml
 
-模型配置
+Model Configuration
 -----------------
 
+**1. layoutlmv3**
+
+.. code:: yaml
+
+    inputs: assets/demo/layout_detection
+    outputs: outputs/layout_detection
+    tasks:
+      layout_detection:
+        model: layout_detection_layoutlmv3
+        model_config:
+          model_path: path/to/layoutlmv3_model
+
+- inputs/outputs: Define the input file path and the directory for visualization output.
+- tasks: Define the task type, currently only a layout detection task is included.
+- model: Specify the specific model type, e.g., layout_detection_layoutlmv3.
+- model_config: Define the model configuration.
+- model_path: Path to the model weights.
+
+**2. yolov10**
+
+Compared to layoutlmv3, yolov10 has faster inference speed and supports batch mode inference.
+
 .. code:: yaml
 
     inputs: assets/demo/layout_detection
     outputs: outputs/layout_detection
     tasks:
-        layout_detection:
-            model: layout_detection_yolo
-            model_config:
-               img_size: 1280
-               conf_thres: 0.25
-               iou_thres: 0.45
-               batch_size: 1
-               model_path: models/Layout/yolov8/yolov8_mixed_1600.pt
-               visualize: True
-
-- inputs/outputs: 分别定义输入文件路径和可视化输出目录
-- tasks: 定义任务类型，当前只包含一个布局检测任务
-- model: 定义具体模型类型: 如layout_detection_yolo 或者 layout_detection_layoutlmv3
-- model_config: 定义模型配置
-- img_size: 定义图像长边大小，短边会根据长边等比例缩放
-- conf_thres: 定义置信度阈值，仅检测大于该阈值的目标
-- iou_thres: 定义IoU阈值，去除重叠度大于该阈值的目标
-- batch_size: 定义批量大小，推理时每次同时推理的图像数，一般情况下越大推理速度越快，显卡越好该数值可以设置的越大
-- model_path: 模型权重路径
-- visualize: 是否对模型结果进行可视化，可视化结果会保存在outputs目录下。
-
-多样化输入支持
+      layout_detection:
+        model: layout_detection_yolo
+        model_config:
+          img_size: 1280
+          conf_thres: 0.25
+          iou_thres: 0.45
+          batch_size: 2
+          model_path: path/to/yolov10_model
+          visualize: True
+          rect: True
+          device: "0"
+
+- inputs/outputs: Define the input file path and the directory for visualization output.
+- tasks: Define the task type, currently only a layout detection task is included.
+- model: Specify the specific model type, e.g., layout_detection_yolo.
+- model_config: Define the model configuration.
+- img_size: Define the image long edge size; the short edge will be scaled proportionally based on the long edge, with the default long edge being 1280.
+- conf_thres: Define the confidence threshold, detecting only targets above this threshold.
+- iou_thres: Define the IoU threshold, removing targets with an overlap greater than this threshold.
+- batch_size: Define the batch size, the number of images inferred simultaneously during inference. Generally, the larger the batch size, the faster the inference speed; a better GPU allows for a larger batch size.
+- model_path: Path to the model weights.
+- visualize: Whether to visualize the model results; visualized results will be saved in the outputs directory.
+- rect: Whether to enable rectangular inference, default is True. If set to True, images in the same batch will be scaled while maintaining aspect ratio and padded to the same size; if False, all images in the same batch will be resized to (img_size, img_size) for inference.
+
+Diverse Input Support
 -----------------
 
-PDF-Extract-Kit中的布局检测脚本支持 ``单个图像``、 ``只包含图像文件的目录``、 ``单个PDF文件``、 ``只包含PDF文件的目录``等输入形式。
+The layout detection script in PDF-Extract-Kit supports input formats such as a ``single image``, a ``directory containing only image files``, a ``single PDF file``, and a ``directory containing only PDF files``.
 
 .. note::
 
-   根据自己实际数据形式，修改configs/layout_detection.yaml中inputs的路径即可
-   - 单个图像: path/to/image  
-   - 图像文件夹: path/to/images  
-   - 单个PDF文件: path/to/pdf  
-   - PDF文件夹: path/to/pdfs  
+   Modify the path to inputs in configs/layout_detection.yaml according to your actual data format:
+   - Single image: path/to/image  
+   - Image directory: path/to/images  
+   - Single PDF file: path/to/pdf  
+   - PDF directory: path/to/pdfs  
 
 .. note::
-   当使用PDF作为输入时，需要将 ``formula_detection.py``
+   When using PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``formula_detection.py``.
 
    .. code:: python
 
       # for image detection
       detection_results = model_layout_detection.predict_images(input_data, result_path)
 
-   中的 ``predict_images``修改为 ``predict_pdfs``。
+   Change to:
 
    .. code:: python
 
       # for pdf detection
       detection_results = model_layout_detection.predict_pdfs(input_data, result_path)
 
-可视化结果查看
+Viewing Visualization Results
 -----------------
 
-当config文件中 ``visualize`` 设置为 ``True`` 时，可视化结果会保存在 ``outputs`` 目录下。
+When ``visualize`` is set to ``True`` in the config file, the visualization results will be saved in the ``outputs`` directory.
 
 .. note::
 
-   可视化可以方便对模型结果进行分析，但当进行大批量任务时，建议关掉可视化(设置 ``visualize``为 ``False``)，减少内存和磁盘占用。
+   Visualization is helpful for analyzing model results, but for large-scale tasks, it is recommended to turn off visualization (set ``visualize`` to ``False``) to reduce memory and disk usage.
@@ -1,5 +1,5 @@
 ..  _algorithm_ocr:
 ==========================
-光学字符识别(OCR)算法
+OCR (Optical Character Recognition) Algorithm
 ==========================
 Comming soon.
@@ -1,6 +1,6 @@
 ..  _algorithm_reading_oder:
 ==============
-阅读顺序算法
+Reading Order Algorithm
 ==============
 
 Comming soon.
@@ -1,6 +1,6 @@
 ..  _algorithm_table_recognition:
 =================
-表格识别算法
+Table Recognition Algorithm
 =================
 
 Comming soon.