This project is read-only.

Finding a small image in a large image

Jun 11, 2012 at 8:20 AM
Edited Jun 12, 2012 at 9:16 AM

Hi Nick!

I'm sorry, that I disturb you, but I can't find the solution of my problem.
I'd like to make a Fine Matching algorithm in CUDAFY.

I have a GeForce 210 graphic card with a compute capability of 1.2

So, I copy the large and the small image into the device memory. After it I would like to find the small picture in the large picture. 

For example: small picture: 50*50, large picture 150*150

for the first step i have to get the large picture (0,0) koordinate and the small picture (0,0).
I select an area in the large picture which is 50*50, and I compare the two same sized picture.
than i step to the second pixel at the large picture, which is (1,0), and i make the same algorithm, than (2,0), (3,0)....
The images are in 24bit format (RGB without Alpha).
I need that how many same pixels are at the the large picture (x,y). I try to save it in my histogram. For example: athe koordinate (10,10) we had 2465 same pixel /the maximum is 2500, because 50*50 pixel have the small picture/

My Launch looks like that:

int width = inputLarge.Width - inputSmall.Width; /*(large picture width-small picture width)(pixel)*/;
int height = inputLarge.Height - inputSmall.Height; /*(large picture height-small picture height)(pixel)*/;



 dim3 gridDim = new dim3(width, height * inputSmall.Height);
 dim3 blockDim = new dim3(inputSmall.Width); 

gpu.Launch(gridDim, blockDim, "gpuFineMatchLinear", device_small, device_large, device_histogram, width, height, treshold, inputLarge.Width, inputSmall.Width, inputSmall.Height); 

        public static void gpuFineMatch(GThread thread, byte[] small, byte[] large, int[] histogram,
            int width,
            int height,
            int treshold, int largeWidth, int smallWidth, int smallHeight)
            //--> position of histogram and large picture
            int hx = thread.blockIdx.x / smallHeight;
            int hy = thread.blockIdx.y;

            //--> position of small picture
            int kx = thread.threadIdx.x;
            int ky = thread.blockIdx.x % smallHeight;

            //--> Offsetek
            int histoOFF = hy * width + hx;
            int nagykepOFF = (hy * 3 * largeWidth) + hx * 3;
            int kiskepOFF = (ky * 3 * smallWidth) + kx * 3;

            int same = 0;

            int ertek = (int)small[kiskepOFF] - large[nagykepOFF];
            same += (ertek < treshold && ertek > -treshold) ? 1 : 0;
            ertek = (int)small[kiskepOFF + 1] - large[nagykepOFF + 1];
            same += (ertek < treshold && ertek > -treshold) ? 1 : 0;
            ertek = (int)small[kiskepOFF + 2] - large[nagykepOFF + 2];
            same += (ertek < treshold && ertek > -treshold) ? 1 : 0;
            if (same == 3) thread.atomicAdd(ref histogram[histoOFF], 1);

I hope you can answer me.

I have a solution for it, but it needs a lot of CPU, because I make a Clone() picture at every pixel from the large image, and than I use the GPU. So it is very slow.
Sorry for my bad english, i write to you from Hungary.
Thank you!

Jun 13, 2012 at 7:15 PM


I trust you are getting the correct results?  Key things are to minimize data transfers between host and GPU and be careful when using atomics, you may be better to use shared memory.

I'd recommend using NVIDIA's Compute Visual Profiler to see where your bottle necks are. 


Jun 14, 2012 at 10:07 AM


thank you for the answer. Yes, I know it. Because of it, I copy once the two pictures to the GPU memory. I need atomics because I use histogram.

 I tried the NVIDIA's Compute Visual Profiler, it is good program for make a time - statistic for the program.

I use Nsight for debugging my code, but some of my variables can't man see, which I need to fix my problem.

I wrote some algorithms in Cudafy, for example: Convolution matrix /3x3, 5x5/, Substracting, Fine Matching (uses CPU for cropping the image), Motion detection...
Except Fine Matching, the others work such real time with camera (~25 fps).

I need only, how to index the two picture.
I can use the grid dimesion X and Y (65536*65536) and the block dimension X and Y only. My graphics card doesn't have the third dimension (Z).

The grid dimension X is the (large picture width - small picture width)=width (blockIdx.x)
The grid dimension Y is the (large picture height - small picture height)=height * small picture height //I had to hire here the small picture height to, because I dont have the third dimesnion: Z (blockIdx.y)
The block dimension X is the samll picture width. (threadIdx.x)

The maximum of block dimension is 512, so the maximum width of small picture can be 512.
I had to hire the height of small picture into the grid dimension Y.
I believe it because of this I can't get the right indexes of the pictures. :(

After i get the indexes, the program substracts the two picture, fills up the histogram array using atomic thread, and "returns" with the histograms.
The array histogram has values, but not the correct values.

I changed the CUDA code:


        public static void gpuFineMatch(GThread thread, byte[] small, byte[] large, int[] histogram,
            int width /*(large picture width - small picture width)(pixel)*/,
            int height /*(large picture height - small picture height)(pixel)*/,
            int treshold, int largeWidth, int smallWidth, int smallHeight)
            //--> Position of histogram and large picture
            int hx = thread.blockIdx.x;
            int hy = thread.blockIdx.y / smallHeight;

            //--> Position of small picture
            int sx = thread.threadIdx.x;
            int sy = thread.blockIdx.y % smallHeight;

            //--> Actual position of large picture
            int lx = hx + sx;
            int ly = hy + sy;

            //--> Offsets
            int histoOFF = hy * width + hx;
            int largePicOFF = ly * 3 * largeWidth + lx * 3;
            int smallPicOFF = sy * 3 * smallWidth + sx * 3;

            int b = (int)GMath.Abs((int)small[smallPicOFF] - (int)large[largePicOFF]);
            int g = (int)GMath.Abs((int)small[smallPicOFF + 1] - (int)large[largePicOFF + 1]);
            int r = (int)GMath.Abs((int)small[smallPicOFF + 2] - (int)large[largePicOFF + 2]);

            if (b < treshold && g < treshold && r < treshold)
                thread.atomicAdd(ref histogram[histoOFF], 1);

 I get false values in the histogram. 

Thank you!

Jul 13, 2012 at 8:42 AM


The problem is solved. I don't use Atomics. I use loop (for) and global memory for it.
It is enough fast as i want.