Skip to content

ch_ppocr_mobile_v2_0_rec_v2_0模型在arm7hf平台上的输入tensor的shape不符合预期 #10658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lqian opened this issue May 5, 2025 · 3 comments
Assignees

Comments

@lqian
Copy link

lqian commented May 5, 2025

为使您的问题得到快速解决,在建立 Issue 前,请您先通过如下方式搜索是否有相似问题: 历史 issue, FAQ 文档, 官方文档

建立 issue 时,为快速解决问题,请您根据使用情况给出如下信息:

Paddle-Lite/build.opt/lite/api/opt --model_file model.pdmodel --param_file model.pdiparams --optimize_out_type naive_buffer --optimize_out ./ch_ppocr_mobile_v2_0_rec_v2_0 --valid_targets arm 
Loading topology data from model.pdmodel
Loading params data from model.pdiparams
1. Model is successfully loaded!
2. Model is optimized and saved into ./ch_ppocr_mobile_v2_0_rec_v2_0.nb successfully

./build/PADDLE-LITE-TEST ./ch_ppocr_mobile_v2_0_rec_v2_0.nb  ./303.jpg ./ppocr_keys_v1.txt 
load 6623 dictionary tokens
[I  1/ 1  1:57:22.214 ...ace/Paddle-Lite/lite/core/device_info.cc:238 get_cpu_arch] Unknow cpu arch: 3079
[I  1/ 1  1:57:22.216 ...ace/Paddle-Lite/lite/core/device_info.cc:1118 Setup] ARM multiprocessors name: MODEL NAME      : ARMV7 PROCESSOR REV 5 (V7L)
HARDWARE        : LOMBOTECH-N7 (FLATTENED DEVICE TREE)

[I  1/ 1  1:57:22.218 ...ace/Paddle-Lite/lite/core/device_info.cc:1119 Setup] ARM multiprocessors number: 1
[I  1/ 1  1:57:22.218 ...ace/Paddle-Lite/lite/core/device_info.cc:1121 Setup] ARM multiprocessors ID: 0, max freq: 0, min freq: 0, cluster ID: 0, CPU ARCH: A-1
[I  1/ 1  1:57:22.218 ...ace/Paddle-Lite/lite/core/device_info.cc:1127 Setup] L1 DataCache size is: 
[I  1/ 1  1:57:22.218 ...ace/Paddle-Lite/lite/core/device_info.cc:1129 Setup] 32 KB
[I  1/ 1  1:57:22.219 ...ace/Paddle-Lite/lite/core/device_info.cc:1131 Setup] L2 Cache size is: 
[I  1/ 1  1:57:22.219 ...ace/Paddle-Lite/lite/core/device_info.cc:1133 Setup] 512 KB
[I  1/ 1  1:57:22.220 ...ace/Paddle-Lite/lite/core/device_info.cc:1135 Setup] L3 Cache size is: 
[I  1/ 1  1:57:22.220 ...ace/Paddle-Lite/lite/core/device_info.cc:1137 Setup] 0 KB
[I  1/ 1  1:57:22.220 ...ace/Paddle-Lite/lite/core/device_info.cc:1139 Setup] Total memory: 59556KB
create predictor :4198012 
input name: x
--------- showed input names ---------
--------- GetInputByName("x")---------
get input tensor shape dimension: 4 
input tensor shape dimension: 0 1 0 3
predictor->Run() before
predictor->Run() done
output predict shape 0 0 0
output name: save_infer_model/scale_0.tmp_1
--------- showed output names ---------
inference cost: 239.409# 


  • 问题描述:下载的paddlepaddle文件显示ch_ppocr_mobile_v2_0_rec_v2_0输入tensor shape应该是[x 3 32 100], 在测试代码中input_tensor0->Resize({1, 3, 32, 100});实际输出的是[0 1 0 3],可能是什么原因导致的
@lqian
Copy link
Author

lqian commented May 5, 2025

完整的测试代码

/*
 * paddle-lite-test.cpp
 *
 *  Created on: May 5, 2025
 */

#include <chrono>
#include <iostream>
#include <fstream>
#include <vector>
#include "opencv2/core.hpp"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include "paddle_api.h"  // NOLINT

using namespace std;
using namespace paddle::lite_api;  // NOLINT
using namespace cv;


const std::vector<int> rec_image_shape{3, 32, 128};

cv::Mat CrnnResizeImg(cv::Mat img, float wh_ratio, int rec_image_height) {
  int imgC, imgH, imgW;
  imgC = rec_image_shape[0];
  imgH = rec_image_height;
  imgW = rec_image_shape[2];

  imgW = int(imgH * wh_ratio);

  float ratio = float(img.cols) / float(img.rows);
  int resize_w, resize_h;

  if (ceilf(imgH * ratio) > imgW)
    resize_w = imgW;
  else
    resize_w = int(ceilf(imgH * ratio));
  cv::Mat resize_img;
  cv::resize(img, resize_img, cv::Size(resize_w, imgH), 0.f, 0.f,
             cv::INTER_LINEAR);
  cv::copyMakeBorder(resize_img, resize_img, 0, 0, 0,
                     int(imgW - resize_img.cols), cv::BORDER_CONSTANT,
                     {127, 127, 127});
  return resize_img;
}

// fill tensor with mean and scale and trans layout: nhwc -> nchw, neon speed up
void neon_mean_scale(const float* din,
                     float* dout,
                     int size,
                     const std::vector<float> mean,
                     const std::vector<float> scale) {
  if (mean.size() != 3 || scale.size() != 3) {
    std::cerr << "[ERROR] mean or scale size must equal to 3\n";
    exit(1);
  }
  float32x4_t vmean0 = vdupq_n_f32(mean[0]);
  float32x4_t vmean1 = vdupq_n_f32(mean[1]);
  float32x4_t vmean2 = vdupq_n_f32(mean[2]);
  float32x4_t vscale0 = vdupq_n_f32(scale[0]);
  float32x4_t vscale1 = vdupq_n_f32(scale[1]);
  float32x4_t vscale2 = vdupq_n_f32(scale[2]);

  float* dout_c0 = dout;
  float* dout_c1 = dout + size;
  float* dout_c2 = dout + size * 2;

  int i = 0;
  for (; i < size - 3; i += 4) {
    float32x4x3_t vin3 = vld3q_f32(din);
    float32x4_t vsub0 = vsubq_f32(vin3.val[0], vmean0);
    float32x4_t vsub1 = vsubq_f32(vin3.val[1], vmean1);
    float32x4_t vsub2 = vsubq_f32(vin3.val[2], vmean2);
    float32x4_t vs0 = vmulq_f32(vsub0, vscale0);
    float32x4_t vs1 = vmulq_f32(vsub1, vscale1);
    float32x4_t vs2 = vmulq_f32(vsub2, vscale2);
    vst1q_f32(dout_c0, vs0);
    vst1q_f32(dout_c1, vs1);
    vst1q_f32(dout_c2, vs2);

    din += 12;
    dout_c0 += 4;
    dout_c1 += 4;
    dout_c2 += 4;
  }
  for (; i < size; i++) {
    *(dout_c0++) = (*(din++) - mean[0]) * scale[0];
    *(dout_c1++) = (*(din++) - mean[1]) * scale[1];
    *(dout_c2++) = (*(din++) - mean[2]) * scale[2];
  }
}


void pre_process(const cv::Mat& img,
                 int width,
                 int height,
                 const std::vector<float>& mean,
                 const std::vector<float>& scale,
                 float* data,
                 bool is_scale = false) {
  cv::Mat resized_img;
  if (img.cols != width || img.rows != height) {
    cv::resize(
        img, resized_img, cv::Size(width, height), 0.f, 0.f, cv::INTER_CUBIC);
  } else {
    resized_img = img;
  }
  cv::Mat imgf;
  float scale_factor = is_scale ? 1.f / 256 : 1.f;
  resized_img.convertTo(imgf, CV_32FC3, scale_factor);
  const float* dimg = reinterpret_cast<const float*>(imgf.data);
  neon_mean_scale(dimg, data, width * height, mean, scale);
}


int rec_image_height = 32;


vector<string> dict;

void print_shape(const char * prefix, const shape_t & shape)
{
	printf("%s shape", prefix);
	for (auto s: shape)
	{
		printf(" %ld", s);
	}
	printf("\n");
}

int main(int argc, char ** argv)
{
	if (argc < 4)
	{
		printf("usage: paddle-lite-test [/path/to/modelfile] [/path/to/image]  [/path/to/dict]\n");
		exit(1);
	}

	cv::Mat img = cv::imread(argv[2]);
	if (img.empty())
	{
		printf("error: empty image \n");
		exit(1);
	}


	ifstream in;
	in.open(argv[3]);
	if (in.is_open())
	{
		string token;
		while (in >> token)
		{
			dict.push_back(token);
		}
	}

	if (dict.size() == 0)
	{
		printf("expect an valid dictionary text file \n");
		exit(1);
	}
	printf("load %d dictionary tokens\n", dict.size());


	string model_file = argv[1];
	// 1. Set MobileConfig
	MobileConfig config;
	config.set_model_from_file(model_file);

	// 2. Create PaddlePredictor by MobileConfig
	std::shared_ptr<PaddlePredictor> predictor = CreatePaddlePredictor<MobileConfig>(config);
	printf("create predictor :%d \n", predictor.get());

	vector<string> inputNames = predictor->GetInputNames();
	for (const string & inputName : inputNames)
	{
		printf("input name: %s\n", inputName.c_str());
	}
	printf("--------- showed input names ---------\n");
	// 3. Prepare input data from image
	// only has one input
	std::unique_ptr<Tensor> input_tensor0(std::move(predictor->GetInputByName("x")));
	printf("--------- GetInputByName(\"x\")---------\n");  //
	input_tensor0->Resize({1, 3, 32, 100});
	shape_t shape = input_tensor0->shape(); //
	printf("get input tensor shape dimension: %d \n", shape.size());
	printf("input tensor shape dimension: %ld %ld %ld %ld\n", shape[0], shape[1], shape[2], shape[3]);

	auto* data = input_tensor0->mutable_data<float>();
	std::vector<float> mean = {127.5, 127.5, 127.5};
	std::vector<float> scale = {0.007843, 0.007843, 0.007843};  // 1/127.5
	pre_process(img, 100, 32, mean, scale, data, false);

//	NeonMeanScale(dimg, data0, resize_img.rows * resize_img.cols, mean, scale);
	auto inference_start = std::chrono::steady_clock::now();
	printf("predictor->Run() before\n");
	predictor->Run();
	printf("predictor->Run() done\n");
    // Get output and run postprocess
    std::unique_ptr<const Tensor> output_tensor0 = predictor->GetOutput(0);

    auto predict_shape = output_tensor0->shape();
    auto inference_end = std::chrono::steady_clock::now();
    print_shape("output predict", predict_shape);

    vector<string> outputNames = predictor->GetOutputNames();
    for (const string & name : outputNames)
    {
    	printf("output name: %s\n", name.c_str());
    }
    printf("--------- showed output names ---------\n");


    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(inference_end - inference_start);
    printf("inference cost: %.3f", duration * 0.001);
    // ctc decode
    auto postprocess_start = std::chrono::steady_clock::now();

    auto *predict_batch = output_tensor0->data<float>();
}



@ddchenhao66
Copy link
Collaborator

tensor的resize是很基础的操作,不太可能有bug,倾向于怀疑导出的nb模型有问题。建议检查下模型是否正常,多加些打印看看。

@ddchenhao66
Copy link
Collaborator

另外可以看下头文件和对应的库是否对应上,以及拿input_tensor0可以用predictor->GetInput(0)的接口试试,看着像是拿到了一个非法tensor,内存读错了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants