This project is read-only.

How to fill the vector randomly, i.e. how use Random class in kernel function

Oct 23, 2013 at 12:14 PM
Hello!
I use MS VS 2010 and Cudafy library. It is easy to make GPU applications with Cudafy. But I face to the following problem. I have a vector and I want to fill it randomly with Cudafy. For example, I represent you following code.

[Cudafy]
public static void func(GThread thread, double[] x, int size, Random z)
{
int idx = thread.blockIdx.x * thread.blockDim.x + thread.threadIdx.x;
if(idx < size)
  x[idx] = z.Next(0, 99);  
}

I have an error during the compilation of my program. Can anybody help me to solve this problem. Thanks!
Oct 24, 2013 at 8:27 AM
The Random class is not supported on the GPU. Instead take a look at the Cudafy.Math namespace and the GPGPURand class - you can see how it is used in the Cudafy.Host.UnitTests and Cudafy.Math.UnitTests projects in the source code.
Oct 24, 2013 at 9:08 AM
But what about the curand_init function? Does cudafy have some analogy of this function?
Oct 24, 2013 at 10:03 AM
Did you look at the file CURANDTests.cs?
        public static void Basics()
        {
            CudafyModule cm = CudafyTranslator.Cudafy(CudafyModes.Architecture);
            Console.WriteLine(cm.CompilerOutput);
            GPGPU gpu = CudafyHost.GetDevice();
            gpu.LoadModule(cm);

            int i, total;
            RandStateXORWOW[] devStates = gpu.Allocate<RandStateXORWOW>(64 * 64);
            int[] devResults = gpu.Allocate<int>(64 * 64);
            int[] hostResults = new int[64 * 64];

            gpu.Set(devResults);

            gpu.Launch(64, 64, "setup_kernel", devStates);
            for (i = 0; i < 10; i++)
                gpu.Launch(64, 64, "generate_kernel", devStates, devResults);

            gpu.CopyFromDevice(devResults, hostResults);

            total = 0;
            for (i = 0; i < 64 * 64; i++)
                total += hostResults[i];
            Console.WriteLine("Fraction with low bit set was {0}", (float) total / (64.0f * 64.0f * 100000.0f * 10.0f));

            gpu.FreeAll();
        }


        [Cudafy]
        public static void setup_kernel(GThread thread, RandStateXORWOW[] state)
        {
            int id = thread.threadIdx.x + thread.blockIdx.x * 64;
            thread.curand_init(1234, (ulong)id, 0, ref state[id]);
        }

        [Cudafy]
        public static void generate_kernel(GThread thread, RandStateXORWOW[] state, int[] result)
        {
            int id = thread.threadIdx.x + thread.blockIdx .x * 64;
            int count = 0;
            uint x = 0;

            /* Copy state to local memory for efficiency */
            RandStateXORWOW localState = state[id];
            /* Generate pseudo - random unsigned ints */
            for (int n = 0; n < 100000; n++)
            {
                x = thread.curand(ref localState);
                /* Check if low bit set */
                if ((x & 1) == 1)
                {
                    count++;
                }
            }
            /* Copy state back to global memory */
            state[id] = localState;
            /* Store results */
            result[id] += count;
        }
Oct 24, 2013 at 11:14 AM
Yes, thank you very much!
Dec 25, 2015 at 5:30 PM
Edited Dec 26, 2015 at 12:05 AM
Reviving this post. I've replicated the CURANDHostTests.cs in my code, but the performance is abysmal. It seems like the GenerateUniform is not being parallelized. What am I missing? Or how can I embed the following into a kernel?
gen.GenerateUniform(devData);