This project is read-only.

Jagged Arrays

Jan 3, 2012 at 9:33 AM

Nick,

I think I'm rather close to finishing some changes to Cudafy that should add some support for jagged arrays. Along with this it should support N-dimensional jagged arrays and multidimension arrays... I'm hoping you will be interested in reviewing the code and applying it to Cudafy.

So far I have...

[Runtime]
Allocate(N-dim)
CopyToDevice(testing 2D should work as N-dim though)
CopyFromDevice(testing 2D should work as N-dim though)

[C# conversion to C]
Method signatures for passing jagged arrays
Method body for accessing jagged arrays
Using GetLength(x) on each dimension of the jagged array

I didn't test supporting converting any other method's or properties other than GetLength. To get to a workable stage all I have left handling the launch of the methods that use jagged arrays and to free the memory used

Do you want me to upload the code as a patch when I'm finished?

Jan 4, 2012 at 2:22 PM

I uploaded the rough draft support for jagged arrays as a patch if anyone needs or wants to fully implement

Jan 5, 2012 at 4:28 PM

Hi there,

Sounds very interesting.  I'll take a look in the coming week. 

Thank you!

Nick

Jan 5, 2012 at 6:40 PM

Initial look - did you put any unit tests in there?  For copying to/from device did you do any performance testing?

Can you also provide just the changed / added files?  You can also submit via email.

Thanks.

Jan 5, 2012 at 10:59 PM

Nick,

Here are the 2 simple tests that I was using

            short[][, ,] hostTest = new short[10][, ,];
            for(int i = 0; i < 10; i++) {
                hostTest[i] = new short[i + 1, i + 1, i + 1];
                for (int j = 0; j < i + 1; j++)
                {
                    for (int k = 0; k < i + 1; k++)
                    {
                        for (int l = 0; l < i + 1; l++)
                        {
                            hostTest[i][j, k, l] = (short)i;
                        }
                    }
                }
            }
            short[][, ,] devtest = gpu.CopyToDevice(hostTest);
            gpu.Launch(10, 1).test_jagged_array(devtest);
            gpu.CopyFromDevice(devtest, hostTest);

        [Cudafy]
        public static void test_jagged_array(GThread thread, short[][, ,] test)
        {
            int id = thread.blockIdx.x;
            
            for (int x = 0; x < test[id].GetLength(0); x++)
            {
                for (int y = 0; y < test[id].GetLength(1); y++)
                {
                    for (int z = 0; z < test[id].GetLength(2); z++)
                    {
                        test[id][x, y, z] = 25;
                    }
                }
            }
            test[id][0, 0, 0] = (short)test[id].GetLength(0);
        }

            short[][][, ,] hostTest2 = new short[10][][, ,];
            for (int i = 0; i < 10; i++)
            {
                hostTest2[i] = new short[5][,,];
                for (int b = 0; b < 5; b++)
                {
                    hostTest2[i][b] = new short[i + 1, i + 1, i + 1];
                    for (int j = 0; j < i + 1; j++)
                    {
                        for (int k = 0; k < i + 1; k++)
                        {
                            for (int l = 0; l < i + 1; l++)
                            {
                                hostTest2[i][b][j, k, l] = (short)i;
                            }
                        }
                    }
                }
            }
            short[][][, ,] devtest2 = gpu.CopyToDevice(hostTest2);
            gpu.Launch(10, 5).test_jagged_array2(devtest2);
            gpu.CopyFromDevice(devtest2, hostTest2);
        [Cudafy]
        public static void test_jagged_array2(GThread thread, short[][][, ,] test)
        {
            int id = thread.blockIdx.x;
            int id2 = thread.threadIdx.x;

            for (int x = 0; x < test[id][id2].GetLength(0); x++)
            {
                for (int y = 0; y < test[id][id2].GetLength(1); y++)
                {
                    for (int z = 0; z < test[id][id2].GetLength(2); z++)
                    {
                        test[id][id2][x, y, z] = 25;
                    }
                }
            }
            test[id][id2][0, 0, 0] = (short)test[id].GetLength(0);
        }

I didn't get to do any performance testing as I had to switch back over to work projects.

I thought maybe to upload only the changed files I wasn't sure how good of a compare program you had. I use Araxis Merge, it's easy to compare directory structures & files.

I changed the following files.

CUDA.cs
--------------------
CompilerOptions.cs
NvccCompilerOptions.cs
KernelModule.cs
--------------------
CudaGPU.cs
EmulatedGPU.cs
GPGPU.cs
--------------------
CudaBLAS.cs
--------------------
CShareFormattingOptions.cs
OutputVisitor.cs (ICSharpCode)
CudafyTranslator.cs
CUDAOutputVisitor.cs
ExtensionMethods.cs



I just remembered I forgot to mention I did not complete the EmulatedGPU implementation. :(

Jan 23, 2012 at 8:14 AM
Edited Jan 23, 2012 at 8:17 AM

Did you get around to performance testing this?  I have my concerns regarding this since you need to send each part of the array separately.  Also how do you handle the possibility that someone creates a jagged array that is, erm, actually jagged?

If it still is looking useful then I suggest creating a derived class from CudaGPU and putting the functionality into this.  The derived class could be called CudaGPUEx.  Extension methods may also work but I am unsure how convenient this will be in terms of needing access to protected members.

Either solution prevents the CudaGPU getting too cluttered.

Jan 23, 2012 at 8:20 AM

I somewhat gave up on on this solution, even though it appears to be working. It actually does support "jagged" arrays. That sample there does display an actual jagged test, it creates and assigns 10 different arrays going from 1 length to 10.

The reason I gave up on it is because I'm looking at a different solution to not be needing so much memory that would not fit in the 1D array. Since copying to the graphics card is insanly slow for that amount of data. It was basically a work around for something I shouldn't have been doing to start with. However it is close to complete for really working jagged arrays, if someday you want to finish it just to have support for it.

Sep 7, 2013 at 1:10 PM
Hello,
First of all, I would say that I am very new to CUDA.
I am very interested in jagged arrays because I would like to make a bill of materials (BOM) calculation, based on a parent-child structure.
I would like to use dynamic parallelism (with a GeeForce GT 640) which would calculate every product's cost based on the materal usage (the parent id is looking for child ids). The products would be the 1st dim int array, and the path of the BOM the 2nd dim int array.
Does this plan make any sense or I have to choose an other way to make faster the ordinary C# parallel.foreach cycle?
Thank you for your advices in advance!
Br,
Peter