Pointer arithmetic on the device

Nov 11, 2012 at 8:06 PM

Thanks for making such an awesome product!

I was particularly impressed to find, through trial and error, that pointer arithmetic can be easily acomplished on a kernel with CUDAfy. Example:

[Cudafy]
public unsafe static void MyKernelMethod(GThread thread, int[] a)
{
    // fix our array to a ptr 
    fixed (int* aPtr = a)
    {
        // get a new array pointing to some offset of the original array
        int* aPtr_Offset = aPtr + thread.blockIdx.x;
        // use it for something
        aPtr_Offset[thread.threadIdx.x] = 2;

        // some more code...
    }
}

 

becomes converted correct nvcc-ready C code:

extern "C" __global__ void MyKernelMethod(int* a, int aLen0)
{
	int* ptr = a;

	{
		int* ptr2 = ptr + blockIdx.x;
		ptr2[(threadIdx.x)] = 2;
	}
}

Bloody amazing!

But did you notice the last argument on the C conversion (int aLen0)? Is there a way to do without it?

Once again, thanks for such an awesome software.

 

 

 

Coordinator
Nov 13, 2012 at 8:59 AM

Thanks for your comments.  Well the aLen0 is there for a reason.  The passing of the correct argument to this is done automatically when calling Launch.  By including this automatic parameter we allow the use of the Length property and GetLength method on the array.  Try using Length in your kernel code say by checking that thread.threadIdx.x is within the bounds of the array a.

Nov 13, 2012 at 4:31 PM

Indeed. Alas, those extra args bloat the kernel stack, and most of the time (at least in my case) are useless. May I suggest a simple solution? Marking the array argument on the func declaration with an optional attribute would block the creation of the hidden length argument (and the use of that array's length property within device code, of course). Something such as this:

[Cudafy]
public unsafe static void MyKernelMethod(GThread thread, [Cudafy_NoLength] int[] a)
{
    // ...
}

 

Nov 13, 2012 at 6:34 PM

It's been ages since I last read or wrote assembler code, but a casual glance suggests that those extra variables, when unused, are being optimized out by nvcc. If so, they are only cluttering the .CU file, and not the Kernels.

Oct 6, 2013 at 10:56 PM
@pedritolo1 Is the "fixed" keyword necessary when the code will be running on a gpu? Can a gpu relocate data?
Oct 7, 2013 at 8:58 AM
j44p wrote:
@pedritolo1 Is the "fixed" keyword necessary when the code will be running on a gpu? Can a gpu relocate data?
I'm sorry, I don't understand your question