This project is read-only.

Locking in Cudafy?

Jun 20, 2013 at 8:56 PM
Hello,

I have a 2D array that I'm using to sum the results stored in a 3D array (bascially summing out 1 dimension), but unfortunately there's enough contention that the writes are overwriting each-other and I cannot guarantee accurate results.

Is there a way to do fine grain locking? Aka lock only the index in the array that is being added to?

Or is there any other solution to make this instruction atomic?
Results[V1, C] += intermediate[T, V2, C];
The whole code section is here:
int tid = thread.blockIdx.x * thread.blockDim.x + thread.threadIdx.x;
if (3*tid < array1.Length)
{
    for (int v1 = 0; v1 < 3; v1++)
    {
        for (int c = 0; c < 3; c++)
        {
            int v2 = array1[tid, v1];      
            results[v2, c] += intermediates[tid, v1, c];
        }
     }
 }
Jun 21, 2013 at 9:07 AM
For performance reasons also I would store the results first in shared memory or a local register, then synchronize and add.
You need to look namespace Cudafy.Atomics and make use of thread.AllocateShared(..). Furthermore look at chapter09 hist_gpu_shmem_atomics examples in the CudafyByExample project.
All the best,
Nick
Jun 21, 2013 at 5:16 PM
Thank you for your help!

So I'm getting an "error: no instance of overloaded function "atomicAdd" matches the argument list. Arguement types are: (float *, float)"

Even though the autocomplete has the first overloaded method to be "atomicAdd(ref float address, float value)"

As far as I can tell I'm doing it exactly as in the chapter09 hist_gpu_shmem_atomics example... except that they use uint.

Here's the code:
int tid = thread.blockIdx.x * thread.blockDim.x + thread.threadIdx.x;
if (3*tid < array1.Length)
{
    for (int v1 = 0; v1 < 3; v1++)
    {
        for (int c = 0; c < 3; c++)
        {
            int v2 = array1[tid, v1];      
            thread.atomicAdd(ref results[v2, c], intermediates[tid, v1, c]);
        }
     }
 }
Suggestions?
Jun 21, 2013 at 6:04 PM
is that a compilation or a runtime error?
Jun 21, 2013 at 6:36 PM
Edited Jun 21, 2013 at 6:38 PM
pedritolo1 wrote:
is that a compilation or a runtime error?
Well, it builds correctly so I assumed run-time, but on closer inspection the output says "1 error detected in the compilation of "ProgramName.cpp1.ii"

So I guess it's actually compilation...
Jun 21, 2013 at 6:51 PM
The error is when you run your application? It refers to an error when you do the Cudafy process. So it is compilation error from nvcc. What compute capability is your GPU?
Jun 21, 2013 at 7:01 PM
Edited Jun 21, 2013 at 10:22 PM
It's cuda compute 3.0 (6 series) so that shouldn't be an issue.

Any idea how to fix it?
Jun 24, 2013 at 10:42 PM
So is there anything I can do to fix this?

Where would I begin to chase this down?
Jun 25, 2013 at 2:20 AM
Edited Jun 25, 2013 at 2:50 AM
Even though your gpu is compute 3, you'll still need to cudafy explicitly for it. I've read of problems with atomics using floats that show exactly the same error as yours when compiled for a lower compute. Maybe that'll fix it?

Edit Yup, just tested, I was correct in my previous assertion.
Jun 25, 2013 at 3:29 PM
Edited Jun 25, 2013 at 3:30 PM
So how would I do that? Looking at the example (shown below) I see a check for compute capability, but there's no explicit set as far as I can tell:
CudafyModule km = CudafyTranslator.Cudafy();

GPGPU gpu = CudafyHost.GetDevice(CudafyModes.Target, CudafyModes.DeviceId);
if (gpu is CudaGPU && gpu.GetDeviceProperties().Capability < new Version(1, 2))
{
     Console.WriteLine("Compute capability 1.2 or higher required for atomics.");
     return -1;
}
gpu.LoadModule(km);
Thank you for your help!
Jun 25, 2013 at 4:35 PM
It's an argument of CudafyTranslator.Cudafy(...)
Jun 25, 2013 at 4:59 PM
Edited Jun 25, 2013 at 4:59 PM
Thank you - that worked perfectly.

For reference if anyone else has this problem:
CudafyModule km = CudafyTranslator.Cudafy(eArchitecture.sm_30);
You can also use:
eArchitecture.OpenCL;
eArchitecture.sm_11;
eArchitecture.sm_12;
eArchitecture.sm_13;
eArchitecture.sm_20;
eArchitecture.sm_21;
eArchitecture.sm_30;
eArchitecture.sm_35;
Depending on your GPU's compute capability, which can be found here: https://en.wikipedia.org/wiki/CUDA#Supported_GPUs

Thank you again,

Zach