opencv
diff --git a/‎modules/ccalib/tutorials/multi_camera_tutorial.markdown
Lines changed: 5 additions & 5 deletions b/‎modules/ccalib/tutorials/multi_camera_tutorial.markdown
Lines changed: 5 additions & 5 deletions
diff --git a/‎modules/cudaarithm/include/opencv2/cudaarithm.hpp
Lines changed: 7 additions & 3 deletions b/‎modules/cudaarithm/include/opencv2/cudaarithm.hpp
Lines changed: 7 additions & 3 deletions
diff --git a/‎modules/cudaarithm/src/cuda/threshold.cu
Lines changed: 245 additions & 1 deletion b/‎modules/cudaarithm/src/cuda/threshold.cu
Lines changed: 245 additions & 1 deletion
@@ -5,7 +5,7 @@ This tutorial will show how to use the multiple camera calibration toolbox. This
 
 Random Pattern Calibration Object
 -------------------------------
-The random pattern is an image that is randomly generated. It is "random" so that it has many feature points. After generating it, one print it out and use it as a calibration object. The following two images are random pattern and a photo taken for it.
+The random pattern is an image that is randomly generated. It is "random" so that it has many feature points. After generating it, print it out and use it as a calibration object. The following two images are random pattern and a photo taken for it.
 
 ![image](img/random_pattern.jpg)
 ![image](img/pattern_img.jpg)
@@ -14,7 +14,7 @@ To generate a random pattern, use the class ```cv::randpattern::RandomPatternGen
 ```
 cv::randpattern::RandomPatternGenerator generator(width, height);
 generator.generatePattern();
-pattern = generator.getPattern();
+cv::Mat pattern = generator.getPattern();
 ```
 Here ```width``` and ```height``` are width and height of pattern image. After getting the pattern, print it out and take some photos of it.
 
@@ -26,20 +26,20 @@ finder.computeObjectImagePoints(vecImg);
 vector<Mat> objectPoints = finder.getObjectPoints();
 vector<Mat> imagePoints = finder.getImagePoints();
 ```
-Here variable ```patternWidth``` and ```patternHeight``` are physical pattern width and height with some user defined unit. ```vecImg``` is a vector of images that stores calibration images.
+Here the variables ```patternWidth``` and ```patternHeight``` refer to the physical dimensions of the calibration object in the chosen unit of measurement. ```vecImg``` is a vector of images that stores calibration images.
 
 Second, use calibration functions like ```cv::calibrateCamera``` or ```cv::omnidir::calibrate``` to calibrate camera.
 
 Multiple Cameras Calibration
 -------------------------------
 Now we move to multiple camera calibration, so far this toolbox must use random pattern object.
 
-To calibrate multiple cameras, we first need to take some photos of random pattern. Of cause, to calibrate the extrinsic parameters, one pattern need to be viewed by multiple cameras (at least two) at the same time. Another thing is that to help the program know which camera and which pattern the photo is taken, the image file should be named as "cameraIdx-timestamp.*". Photos with same timestamp means that they are the same object taken by several cameras. In addition, cameraIdx should start from 0. Some examples of files names are "0-129.png", "0-187.png", "1-187", "2-129".
+To calibrate multiple cameras, we first need to take some photos of random pattern. Of course, to calibrate the extrinsic parameters, one pattern needs to be viewed by multiple cameras (at least two) at the same time. Another thing is that to help the program know which camera and which pattern the photo is taken, the image file should be named as "cameraIdx-timestamp.*". Photos with same timestamp means that they are the same object taken by several cameras. In addition, cameraIdx should start from 0. Some examples of files names are "0-129.png", "0-187.png", "1-187", "2-129".
 
 Then, we can run multiple cameras calibration as
 ```
 cv::multicalib::MultiCameraCalibration multiCalib(cameraType, nCamera, inputFilename,patternWidth, patternHeight, showFeatureExtraction, nMiniMatches);
 multiCalib.run();
 multiCalib.writeParameters(outputFilename);
 ```
-Here ```cameraType``` indicates the camera type, ```multicalib::MultiCameraCalibration::PINHOLE``` and ```multicalib::MultiCameraCalibration::OMNIDIRECTIONAL``` are supported. For omnidirectional camera, you can refer to ```cv::omnidir``` module for detail. ```nCamera``` is the number of camers. ```inputFilename``` is the name of a file generated by ```imagelist_creator``` from ```opencv/sample```. It stores names of random pattern and calibration images, the first file name is the name of random pattern. ```patternWidth``` and ```patternHeight``` are physical width and height of pattern. ```showFeatureExtraction``` is a flags to indicate whether show feature extraction process. ```nMiniMatches``` is a minimal points that should be detected in each frame, otherwise this frame will be abandoned. ```outputFilename``` is a xml file name to store parameters.
+Here ```cameraType``` indicates the camera type, ```multicalib::MultiCameraCalibration::PINHOLE``` and ```multicalib::MultiCameraCalibration::OMNIDIRECTIONAL``` are supported. For omnidirectional camera, you can refer to ```cv::omnidir``` module for detail. ```nCamera``` is the number of cameras. ```inputFilename``` is the name of a file generated by ```imagelist_creator``` from ```opencv/sample```. It stores names of random pattern and calibration images, the first file name is the name of random pattern. ```patternWidth``` and ```patternHeight``` represents the physical width and height of the pattern. ```showFeatureExtraction``` is a boolean flag that determines whether the feature extraction process is displayed. ```nMiniMatches``` is the minimum number of points that should be detected in each frame, otherwise this frame will be abandoned. ```outputFilename``` is an XML that will store the calibration parameters.
@@ -546,12 +546,16 @@ static inline void scaleAdd(InputArray src1, double alpha, InputArray src2, Outp
 
 /** @brief Applies a fixed-level threshold to each array element.
 
+The special value cv::THRESH_OTSU may be combined with one of the other types. In this case, the function determines the
+optimal threshold value using the Otsu's and uses it instead of the specified threshold. The function returns the
+computed threshold value in addititon to the thresholded matrix.
+The Otsu's method is implemented only for 8-bit matrices.
+
 @param src Source array (single-channel).
-@param dst Destination array with the same size and type as src .
+@param dst Destination array with the same size and type as src.
 @param thresh Threshold value.
 @param maxval Maximum value to use with THRESH_BINARY and THRESH_BINARY_INV threshold types.
-@param type Threshold type. For details, see threshold . The THRESH_OTSU and THRESH_TRIANGLE
-threshold types are not supported.
+@param type Threshold type. For details, see threshold. The THRESH_TRIANGLE threshold type is not supported.
 @param stream Stream for the asynchronous version.
 
 @sa threshold
 
@@ -95,12 +95,256 @@ namespace
     }
 }
 
-double cv::cuda::threshold(InputArray _src, OutputArray _dst, double thresh, double maxVal, int type, Stream& stream)
+
+__global__ void otsu_sums(uint *histogram, uint *threshold_sums, unsigned long long *sums)
+{
+    const uint n_bins = 256;
+
+    __shared__ uint shared_memory_ts[n_bins];
+    __shared__ unsigned long long shared_memory_s[n_bins];
+
+    int bin_idx = threadIdx.x;
+    int threshold = blockIdx.x;
+
+    uint threshold_sum_above = 0;
+    unsigned long long sum_above = 0;
+
+    if (bin_idx > threshold)
+    {
+        uint value = histogram[bin_idx];
+        threshold_sum_above = value;
+        sum_above = value * bin_idx;
+    }
+
+    blockReduce<n_bins>(shared_memory_ts, threshold_sum_above, bin_idx, plus<uint>());
+    blockReduce<n_bins>(shared_memory_s, sum_above, bin_idx, plus<unsigned long long>());
+
+    if (bin_idx == 0)
+    {
+        threshold_sums[threshold] = threshold_sum_above;
+        sums[threshold] = sum_above;
+    }
+}
+
+__global__ void
+otsu_variance(float2 *variance, uint *histogram, uint *threshold_sums, unsigned long long *sums)
+{
+    const uint n_bins = 256;
+
+    __shared__ signed long long shared_memory_a[n_bins];
+    __shared__ signed long long shared_memory_b[n_bins];
+
+    int bin_idx = threadIdx.x;
+    int threshold = blockIdx.x;
+
+    uint n_samples = threshold_sums[0];
+    uint n_samples_above = threshold_sums[threshold];
+    uint n_samples_below = n_samples - n_samples_above;
+
+    unsigned long long total_sum = sums[0];
+    unsigned long long sum_above = sums[threshold];
+    unsigned long long sum_below = total_sum - sum_above;
+
+    float threshold_variance_above_f32 = 0;
+    float threshold_variance_below_f32 = 0;
+    if (bin_idx > threshold)
+    {
+        float mean = (float) sum_above / n_samples_above;
+        float sigma = bin_idx - mean;
+        threshold_variance_above_f32 = sigma * sigma;
+    }
+    else
+    {
+        float mean = (float) sum_below / n_samples_below;
+        float sigma = bin_idx - mean;
+        threshold_variance_below_f32 = sigma * sigma;
+    }
+
+    uint bin_count = histogram[bin_idx];
+    signed long long threshold_variance_above_i64 = (signed long long)(threshold_variance_above_f32 * bin_count);
+    signed long long threshold_variance_below_i64 = (signed long long)(threshold_variance_below_f32 * bin_count);
+    blockReduce<n_bins>(shared_memory_a, threshold_variance_above_i64, bin_idx, plus<signed long long>());
+    blockReduce<n_bins>(shared_memory_b, threshold_variance_below_i64, bin_idx, plus<signed long long>());
+
+    if (bin_idx == 0)
+    {
+        variance[threshold] = make_float2(threshold_variance_above_i64, threshold_variance_below_i64);
+    }
+}
+
+template <uint n_thresholds>
+__device__ bool has_lowest_score(
+    uint threshold, float original_score, float score, uint *shared_memory
+) {
+    // It may happen that multiple threads have the same minimum score. In that case, we want to find the thread with
+    // the lowest threshold. This is done by calling '__syncthreads_count' to count how many threads have a score
+    // that matches to the minimum score found. Since this is rare, we will optimize towards the common case where only
+    // one thread has the minimum score. If multiple threads have the same minimum score, we will find the minimum
+    // threshold that satifies the condition
+    bool has_match = original_score == score;
+    uint matches = __syncthreads_count(has_match);
+
+    if(matches > 1) {
+        // If this thread has a match, we use it; otherwise we give it a value that is larger than the maximum
+        // threshold, so it will never get picked
+        uint min_threshold = has_match ? threshold : n_thresholds;
+
+        blockReduce<n_thresholds>(shared_memory, min_threshold, threshold, minimum<uint>());
+
+        return min_threshold == threshold;
+    } else {
+        return has_match;
+    }
+}
+
+__global__ void
+otsu_score(uint *otsu_threshold, uint *threshold_sums, float2 *variance)
+{
+    const uint n_thresholds = 256;
+
+    __shared__ float shared_memory[n_thresholds];
+
+    int threshold = threadIdx.x;
+
+    uint n_samples = threshold_sums[0];
+    uint n_samples_above = threshold_sums[threshold];
+    uint n_samples_below = n_samples - n_samples_above;
+
+    float threshold_mean_above = (float)n_samples_above / n_samples;
+    float threshold_mean_below = (float)n_samples_below / n_samples;
+
+    float2 variances = variance[threshold];
+    float variance_above = n_samples_above > 0 ? variances.x / n_samples_above : 0.0f;
+    float variance_below = n_samples_below > 0 ? variances.y / n_samples_below : 0.0f;
+
+    float above = threshold_mean_above * variance_above;
+    float below = threshold_mean_below * variance_below;
+    float score = above + below;
+
+    float original_score = score;
+
+    blockReduce<n_thresholds>(shared_memory, score, threshold, minimum<float>());
+
+    if (threshold == 0)
+    {
+        shared_memory[0] = score;
+    }
+    __syncthreads();
+
+    score = shared_memory[0];
+
+    // We found the minimum score, but in some cases multiple threads can have the same score, so we need to find the
+    // lowest threshold
+    if (has_lowest_score<n_thresholds>(threshold, original_score, score, (uint *) shared_memory))
+    {
+        *otsu_threshold = threshold;
+    }
+}
+
+void compute_otsu(uint *histogram, uint *otsu_threshold, Stream &stream)
+{
+    const uint n_bins = 256;
+    const uint n_thresholds = 256;
+
+    cudaStream_t cuda_stream = StreamAccessor::getStream(stream);
+
+    dim3 block_all(n_bins);
+    dim3 grid_all(n_thresholds);
+    dim3 block_score(n_thresholds);
+    dim3 grid_score(1);
+
+    BufferPool pool(stream);
+    GpuMat gpu_threshold_sums(1, n_bins, CV_32SC1, pool.getAllocator());
+    GpuMat gpu_sums(1, n_bins, CV_64FC1, pool.getAllocator());
+    GpuMat gpu_variances(1, n_bins, CV_32FC2, pool.getAllocator());
+
+    otsu_sums<<<grid_all, block_all, 0, cuda_stream>>>(
+        histogram, gpu_threshold_sums.ptr<uint>(), gpu_sums.ptr<unsigned long long>());
+    otsu_variance<<<grid_all, block_all, 0, cuda_stream>>>(
+        gpu_variances.ptr<float2>(), histogram, gpu_threshold_sums.ptr<uint>(), gpu_sums.ptr<unsigned long long>());
+    otsu_score<<<grid_score, block_score, 0, cuda_stream>>>(
+        otsu_threshold, gpu_threshold_sums.ptr<uint>(), gpu_variances.ptr<float2>());
+}
+
+// TODO: Replace this is cv::cuda::calcHist
+template <uint n_bins>
+__global__ void histogram_kernel(
+    uint *histogram, const uint8_t *image, uint width,
+    uint height, uint pitch)
+{
+    __shared__ uint local_histogram[n_bins];
+
+    uint x = blockIdx.x * blockDim.x + threadIdx.x;
+    uint y = blockIdx.y * blockDim.y + threadIdx.y;
+    uint tid = threadIdx.y * blockDim.x + threadIdx.x;
+
+    if (tid < n_bins)
+    {
+        local_histogram[tid] = 0;
+    }
+
+    __syncthreads();
+
+    if (x < width && y < height)
+    {
+        uint8_t value = image[y * pitch + x];
+        atomicInc(&local_histogram[value], 0xFFFFFFFF);
+    }
+
+    __syncthreads();
+
+    if (tid < n_bins)
+    {
+        cv::cudev::atomicAdd(&histogram[tid], local_histogram[tid]);
+    }
+}
+
+// TODO: Replace this with cv::cuda::calcHist
+void calcHist(
+    const GpuMat src, GpuMat histogram, Stream stream)
+{
+    const uint n_bins = 256;
+
+    cudaStream_t cuda_stream = StreamAccessor::getStream(stream);
+
+    dim3 block(128, 4, 1);
+    dim3 grid = dim3(divUp(src.cols, block.x), divUp(src.rows, block.y), 1);
+    CV_CUDEV_SAFE_CALL(cudaMemsetAsync(histogram.ptr<uint>(), 0, n_bins * sizeof(uint), cuda_stream));
+    histogram_kernel<n_bins>
+        <<<grid, block, 0, cuda_stream>>>(
+            histogram.ptr<uint>(), src.ptr<uint8_t>(), (uint) src.cols, (uint) src.rows, (uint) src.step);
+}
+
+double cv::cuda::threshold(InputArray _src, OutputArray _dst, double thresh, double maxVal, int type, Stream &stream)
 {
     GpuMat src = getInputMat(_src, stream);
 
     const int depth = src.depth();
 
+    const int THRESH_OTSU = 8;
+    if ((type & THRESH_OTSU) == THRESH_OTSU)
+    {
+        CV_Assert(depth == CV_8U);
+        CV_Assert(src.channels() == 1);
+
+        BufferPool pool(stream);
+
+        // Find the threshold using Otsu and then run the normal thresholding algorithm
+        GpuMat gpu_histogram(256, 1, CV_32SC1, pool.getAllocator());
+        calcHist(src, gpu_histogram, stream);
+
+        GpuMat gpu_otsu_threshold(1, 1, CV_32SC1, pool.getAllocator());
+        compute_otsu(gpu_histogram.ptr<uint>(), gpu_otsu_threshold.ptr<uint>(), stream);
+
+        cv::Mat mat_otsu_threshold;
+        gpu_otsu_threshold.download(mat_otsu_threshold, stream);
+        stream.waitForCompletion();
+
+        // Overwrite the threshold value with the Otsu value and remove the Otsu flag from the type
+        type = type & ~THRESH_OTSU;
+        thresh = (double) mat_otsu_threshold.at<int>(0);
+    }
+
     CV_Assert( depth <= CV_64F );
     CV_Assert( type <= 4 /*THRESH_TOZERO_INV*/ );