OpenCL Histogram for BGRA image

Sep 25, 2013 at 10:58 PM
Hi, I started using Cudafy recently and I must say it really is a great product.

What I want to do is calculate the histogram of an image represented by a vector of bytes organized as BGRA. I realized the algorithm in C # using the Interlocker.Increment but I could not find a similar function on Cudafy using OpenCL. Unfortunately Cuda is not an option because the computer on which I run the code I have a Radeon HD6450.
        int length = src.Length / 4;
        int[] luminance = new int[256];
        int[] red = new int[256];
        int[] green = new int[256];
        int[] blue = new int[256];

        Parallel.For(0, length, (i) =>
            int x = i * 4;
            Interlocked.Increment(ref luminance[(int)(0.3 * src[x + 2] + 0.59 * src[x+1] + 0.11 * src[x])]);
            Interlocked.Increment(ref red[src[x + 2]]);
            Interlocked.Increment(ref green[src[x + 1]]);
            Interlocked.Increment(ref blue[src[x]]);
        return new Histogram(luminance, red, green, blue);
Can you advise me on how to implement this algorithm or where I could find the appropriate documentation?

Thank you very much,
Alberto Nuti.
Sep 25, 2013 at 11:58 PM
You need to use atomics, they are also available for openCL.
You might want to split it into 2 stages in order to use atomics only on shared memory:
1 - each block constructs its own version of the histogram, using atomics on shared memory and goalesced loads from global mem.
2 - a reduction process merges all histograms

Sep 26, 2013 at 9:28 AM
When I try to use thread.atomicInc or thread.atomicAdd the app crash when the instruction
is reached...

NotSupportedException is thrown.
Sep 26, 2013 at 2:11 PM
Odd. Just tried the histogram example in the cudafyByExample in opencl mode and it went smoothly.
Would you mind runing that example also, see if you still can't get it working? Perhaps you found an unforseen problem with cudafy's opencl integration on ati cards.
Sep 27, 2013 at 8:35 AM
Edited Sep 27, 2013 at 8:53 AM
I tried the example on the computer with which I develop and I get the exception "BuildProgramFailure" (NVIDIA 9600GT). Then I tried the example on my workstation and it worked perfectly (AMD HD 6450 + AMD Turion II Neo N40L DualCore).
The only problem is that the test correctly reports the name of the video card but says "Processors: 2". My concern is that the example is running on the processor (OpenCL 1.2).

On wikipedia it says that the atomic functions are available starting with version 2.0, while on the NVIDIA forum for developers someone managed to operate the atomic functions on a 8800GT (OpenCL 1.0). What is the minimum version of OpenCL for atomic functions with Cudafy?

P.S. I run my software on my workstation using RemoteDebugger.
I managed to solve the problem by rewriting my code following the example and it works fine on my workstation.
As a hobby I started working on an image editing software, like Adobe Lightroom. My new questions arise because I want to reuse the code that I wrote for my project, making it more compatible.
Sep 27, 2013 at 9:44 AM
The compute number applies only for nvidia cards. It's not related with the current version of your OpenCL drivers.
On NVIDIA: you'll only be able to run atomics if the compute number is high enough.
On ATI: it will depend on the specific hardware installed.
Sep 27, 2013 at 9:47 AM
"My concern is that the example is running on the processor (OpenCL 1.2)."

Inspect the properties of the gpu object once aquired, see if matches your HD6450. You can specify the device number when you aquire it.