Creating an array, in gloabl memory, from within a kernel

Jul 5, 2013 at 12:29 PM
Is it possible to create a dynamic array, in global memory, from within a kernel? Ie id like to have a local array for each of my threads, where each threads array will be of different length. Not too found of the idea of passing in several hundred parameters to a kernel. On a par with stabbing oneself in the eye with a pencil.

The follow seems to suggest its possible in Cuda Question on SO

I guess i could use a 2D array of the max size i require, but it will take up more memory.

Any ideas?
Jul 5, 2013 at 2:24 PM
Edited Jul 5, 2013 at 2:28 PM
Hi
Yes, cuda lets you alloc dynamic device mem from within a kernel, if your compute # is high enough.
Alas, that's not yet in place in cudafy, as far as I am aware.
Even if it were, I suspect it would result in a sizeable performance overhead, since mem alloc isn't a trivial operation. Of course, I never tested it, so I could be wrong.
If you can, then yes, I think your 2d array option would be the best. If you can spare the memory.
All other options would drastically reduce performance.

Edit: A good way to reduce the memory usage for the 2d approach would be to use as many threads as possible within each block (as long as ocupancy isn't degraded), and work your algo around it. Launch in a loop the same kernel of just a few blocks each holding lots of threads, thereby reusing your 2d array.
Jul 5, 2013 at 4:26 PM
@Pedritolo1 Hi,
Ive seen ur name a few times on the forum so if you say its a non trivial matter, then i have no doubt it is.

Yeah i was looking into the memory requirements a little more and i think i will have to split it up n call the same kernel several times. As long as its faster than the CPU then i will be happy as it was the main reason for learning Cudafy in the 1st place.

Thanks again for your help/pointer, i really appreciate it.