Declaring a private, thread specific, variable in a kernel and then returning that variable to the host

Jun 6, 2013 at 12:45 PM
I have a method which i want to run on several threads but each thread will return a different number of results. Is it possible to declare a private, thread specific, variable ie a list<int> which i can then pass back to the Host and merge all the results?

Say i have an array as follows:
int[,] arr1 = new int[3,3] {{ 3, 4, 5 }, {4, 5, 6}, {1, 6, 4}};
int[] arr2 = new int[] { 3, 4, 1 };
Each thread will be give 3 values to analyze and records the difference between the value in arr2 and the values for a specific row in arr1.
public static void CountAbove(GThread thread, int[] a, int[,] b, list<int> c)
    int tid = thread.blockIdx.x;
    int threshold = a[tid];

    for(int i = 0; i < b.GetLength(0); i++)
    if (threshold < b[tid,i]) c.add(b[tid,i] - threshold);
NOTE: There will be cases where i dont know how large the results may be. Is it possible to use a List<int> or List<Struct> instead of using an array and assigning a length that i believe/hope wont be reached, or are Lists not supported?
Jun 8, 2013 at 5:38 AM
No this is not possible. You could best make use of shared and global memory with enough space for the maximum number of elements, and track the number actually used. Then after synchronizing the threads within same kernel, you can do your (partial) merge. This will be more effective than doing this on host. Take a look at any CUDA reduction example for tips.