Getting garbage values into output array

Jan 17, 2013 at 3:39 PM

Hi

I have two structs: One is Outages and other is ResultClass.

 
[Serializable()]
    [Cudafy]
    public struct Outages
    {

        public int Year;
        public int StationNo;

        public int CustomerImpacted;

        [Serializable()]
        [Cudafy]
        [StructLayout(LayoutKind.Sequential)]
        public struct ResultClass
        {
            public int year;
            public int customersImpacted;
            public int stationNo;

        }
   }
I have two input arrays, First array is of type Outages. It will contain random values like:
StationNo  Year  CustomersImpacted
1           1950		200
1           1950             250
2	   1954		300
2	   1952		200
1	   1951		190
.........
......... and so on.
There can be N number of records varying from year 1950 to 2012 for StationNumbers varying from 1 to 6.
Second array is of type ResulClass that will be basically a kind of Master array of all the combinations of Years and StationNumbers like:
StationNo		Year		CustomersImpacted
1		1950		0
1		1951		0
1		1952		0
....
1		2012		0
2		1950		0	
2		1951		0
.....
2		2012		0
.....
.....
.....
6		1950		0
6		1951		0
.....
6		2012		0
My basic requirement is that I need to group up the results of first srray like for each Station no, what is number of customers impacted in each year.
Like the end result format will be like something as follows:
StationNo		Year		CustomersImpacted
1		1950		500
1		1951		700
1		1952		400
....
1		2012		500
2		1950		600	
2		1951		700
.....
2		2012		300
.....
.....
.....
6		1950		400
6		1951		500
.....
6		2012		500
There is one another class "OutageMain" in my program in which I am implementing my CUDA program:
public class OutageMain
    {
        private GPGPU _gpu;
        private OutageAnalysis_New.Outages.ResultClass[] results_dev;
        private Outages[] arrOutages_dev;
        private const int ciTHREADSPERBLOCK = 256;
        public OutageMain()
        {
            var mod = CudafyModule.TryDeserialize("outages");
            _gpu = CudafyHost.GetDevice();
            if (mod == null || !mod.TryVerifyChecksums())
            {
                mod = CudafyTranslator.Cudafy(ePlatform.All, eArchitecture.sm_20, typeof(Outages), typeof(OutageMain));
                mod.Serialize("outages");
            }
            // Get the default GPU device and load the module.
           
            _gpu.LoadModule(mod);
        }

        private int GetBlocksPerGrid(int length)
        {
            int blocksPerGrid = (length + (ciTHREADSPERBLOCK) - 1) / (ciTHREADSPERBLOCK);
            return blocksPerGrid;

        }
        public void LoadOutages(Outages[] arrOutages, OutageAnalysis_New.Outages.ResultClass[] results)
        {

            _gpu.FreeAll();
            results_dev = _gpu.Allocate(results);
            arrOutages_dev = _gpu.Allocate(arrOutages);
            int[] r = new int[results.Length];
            int[] dev_r = _gpu.Allocate(r);

            _gpu.CopyToDevice(arrOutages, arrOutages_dev);
            _gpu.CopyToDevice(results, results_dev);

            int blocksPerGrid = GetBlocksPerGrid(arrOutages.Length);

            _gpu.Launch(arrOutages.Length, 1, "GroupOutages", arrOutages_dev, arrOutages.Length, results_dev, arrOutages.Length, dev_r);
            _gpu.Synchronize();
            _gpu.CopyFromDevice(results_dev, results);
            _gpu.CopyFromDevice(dev_r, r);
            _gpu.FreeAll();
            Console.WriteLine("Success!");
            Console.ReadLine();
        }
        [Cudafy]
        public static void GroupOutages(GThread thread, Outages[] outages, int noPoints, OutageAnalysis_New.Outages.ResultClass[] results, int resultsLength, int[] r)
        {
            try
            {
                //int tid = thread.blockDim.x * thread.blockIdx.x + thread.threadIdx.x;
                int tid = thread.blockIdx.x;
                if (tid < noPoints)
                {
                    Outages tp = outages[tid];
                    for (int i = 0; i < resultsLength; i++)
                    {
                        OutageAnalysis_New.Outages.ResultClass target = results[i];
                        if (target.year == tp.Year && target.outageStationNo == tp.OutageStation)
                        {
                            r[i] = r[i] + tp.CustomerImpacted;
                        }
                    }
                    thread.SyncThreads();
                }
            }
            catch (CudafyException e)
            {
                throw (e);
            }
            catch (CudafyFatalException c)
            {
                throw (c);
            }
        }
    }
What i will do next is to match the output array with master array of ResultClass type and get the desired results.
There are instances when I get the correct results in the output array but the issue I am facing is that the Output array is getting Garbage values most of the times.I am a very beginner in CUDA programming and getting very hard time in getting this resolved.
Please help me to fix this issue.
Thanks and Regards
Sachin 
Coordinator
Jan 18, 2013 at 7:45 AM

Not sure if it matters but you pass arrOutages.Length as the length of results in your call to Launch.  Be aware that you can access Array.Length from within your kernel so no need to pass lengths explicitly to kernel.

Jan 19, 2013 at 10:40 AM
Edited Jan 19, 2013 at 10:42 AM

Inconsistent results could be due to

1 - Use of uninitialized device memory buffers.

2 - Thread synch problems. Somewhere a thread.synchthreads is required.

3 - Use of thread.synchthreads within a conditional block of code entered/followed by only some threads within the block. It would be your case once you remove the comment on //int tid = thread.blockDim.x * thread.blockIdx.x + thread.threadIdx.x;

 try this:

...
                if (tid < noPoints)
                {
                    Outages tp = outages[tid];
                    for (int i = 0; i < resultsLength; i++)
                    {
                        OutageAnalysis_New.Outages.ResultClass target = results[i];
                        if (target.year == tp.Year && target.outageStationNo == tp.OutageStation)
                        {
                            r[i] = r[i] + tp.CustomerImpacted;
                        }
                    }
                }
                thread.SyncThreads();
...
Jan 21, 2013 at 10:39 AM

Nothing above helps. I am still facing the same issue. Can you please try to replicate the scenario at your end and see if you r facing this issue or not. I dont know what i m doing wrong.

Jan 21, 2013 at 1:03 PM
Edited Jan 21, 2013 at 9:18 PM

Consider this line:

r[i] = r[i] + tp.CustomerImpacted;

 Let's assume, for example, that i == 9.

Now imagine, say, 100 threads all doing

r[9] += tp.CustomerImpacted;

concurrently and simultanously (because that's how the GPU works). Each reads r[9], makes a sum and writes the result value to r[9], and they do it in no particular order. It should be clear that r[9]'s final value will be undefined, since while some thread is reading, another is already writing, resulting in a mess.

Perhaps what you need is an atomic add. I'm not sure if atomics are available in cudafy's current version. If not, you can always have each thread work with a private copy of "r", and then merge it all on the cpu.

Jan 22, 2013 at 1:39 PM

Any code sample of using private variable and sending it back to CPU

 

Coordinator
Jan 22, 2013 at 2:03 PM
Edited Jan 22, 2013 at 2:05 PM

You already copy data back to CPU in your code above.  Any variable local within a kernel needs to be written back to global memory in order to copy back.  You cannot return values from kernels.  Best to start here: http://cudafy.codeplex.com/documentation