Copy unmanaged System.IntPtr byte vector into row of 2D device byte array

Dec 25, 2014 at 4:09 PM
I have a video frame grabber card that is collecting byte[1024 x 1024] image data at 30 FPS. Every 33.3 ms it fills a slot in a circular buffer and returns a System.IntPtr that points to that un-managed 1D vector of *byte; The Circular buffer has 15 slots.

On the GPU device (Tesla K40) I want to have a global 2D array that is organized as a dense 2D array. That is, I want something like the Circular Queue but on the GPU organized as a dense 2D array.
byte[15, 1024*1024] rawdata; 
// if CUDAfy.NET supported jagged arrays I could use byte[15][1024*1024 but it does not
How can I fill in a different row each 33ms? Do I use something like:
gpu.CopyToDevice<byte>(inputPtr, 0, rawdata, offset, length) // length = 1024*1024
//offset is computed by  rowID*(1024*1024) where rowID wraps to 0 via modulo 15.
// inputPrt is the System.Inptr that points to the buffer in the circular queue (un-managed)?
// rawdata is a device buffer allocated gpu.Allocate<byte>(1024*1024);
And in my kernel header is:
[Cudafy]
public static void filter(GThread thread, byte[,] rawdata, int frameSize, byte[] result)
Dec 25, 2014 at 5:43 PM
Edited Dec 25, 2014 at 6:31 PM
I did try something along these lines. But there is no API pattern in CudaFy for:
GPGPU.CopyToDevice(T) Method (IntPtr, Int32, T[,], Int32, Int32, Int32)
So I used the gpu.Cast Function to change the 2D device array to 1D.

I tried the code below, but I am getting CUDA.net exception: ErrorLaunchFailed

When I try the CUDA emulator, it aborts on the CopyToDevice claiming that Data is not host allocated
          public static byte[] process(System.IntPtr data, int slot)
        {
            Stopwatch watch = new Stopwatch();
            watch.Start();
            byte[] output = new byte[FrameSize];
            int offset = slot*FrameSize;
            gpu.Lock();
            byte[] rawdata = gpu.Cast<byte>(grawdata, FrameSize); // What is the size supposed to be? Documentation lacking
            gpu.CopyToDevice<byte>(data, 0, rawdata, offset, FrameSize);
            byte[] goutput = gpu.Allocate<byte>(output);
            gpu.Launch(height, width).filter(rawdata, FrameSize, goutput);
            runTime = watch.Elapsed.ToString();
            gpu.CopyFromDevice(goutput, output);
            gpu.Free(goutput);
            gpu.Synchronize();
            gpu.Unlock();
            watch.Stop();
            totalRunTime = watch.Elapsed.ToString();
            return output;
        }
Jan 1, 2015 at 6:12 PM
I think I made a dumb mistake, and I am passing the casted rawdata[] array, instead of the grawdata[,] array on launch. I only needed the rawdata[] array for the CopyToDevice, so that I could offset. Stupid me. I am going to go climb into my hole now.